Reduce Alternator table name length limit to 192 and fix crash when adding stream to table with very long name#24717
Conversation
|
This PR is in draft mode waiting for the pull request #24597 to get merged (it was already approved by 3 people, but was unlucky with CI). The first two commits in the draft PR are those from that other pull request and don't need to be reviewed here - and will disappear when I finally rebase. |
To be in line with what I learned so far about issues and PRs in Scylla, a mirroring issue should be required for this PR, and a relevant Also, I suggest to clarify case 1. and explicitly mention that Stream creation will fail in the case of [...]. The specific condition is named only in case 2., while it pertains both. |
Right, I'll do it.
I wrote "report a clear error", which I thought means "fail" :-) I'll change the wording to "fail with a clear error".
I don't understand what you mean... Condition for what? case 1 was about pre-existing tables - tables created before this patch. This means tables whose length are up to 222, which I wrote one line above. But I can repeat the 222 number. The "case 2" has different conditions... |
I ran so fast to do it, that I ended up going back in time and did it in the past: #24598 |
|
Now that #24597 went in, rebased to get rid of the first two commits, and marked the pull request ready to review. Also updated the cover letter. |
I mean the condition |
Ok, edited the cover letter yet again. Let me know if this is what you hoped for. |
Docs Preview 📖Docs Preview for this pull request is available here Changed Files: Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes. |
| plus two, is limited to 222 characters. | ||
| This means that if you create a table whose name's length is close to 222 | ||
| characters, you may not be able to create a GSI or LSI on it. | ||
| So for example, if you create a table whose name 192 characters, you |
There was a problem hiding this comment.
| So for example, if you create a table whose name 192 characters, you | |
| So, for example, if you create a table whose name is 192 characters, you |
There was a problem hiding this comment.
Thanks, pushed a new version fixing this.
annastuchlik
left a comment
There was a problem hiding this comment.
I've added a small suggestion.
Thanks for always remembering to update the docs.
🟢 CI State: SUCCESS
Build Details:
|
ScyllaPiotr
left a comment
There was a problem hiding this comment.
Very nice patch! I only made a couple of suggestions to improve comments, which is always a subjective thing to complain about.
| // such as materialized views or CDC log tables. This second limit might | ||
| // mean that it is not possible to add a GSI or Streams to an existing | ||
| // table, because the name of the new auxiliary table may go over the | ||
| // limit. |
There was a problem hiding this comment.
This description mangles 3 terms: max_table_name_length, max_auxiliary_table_name_length and max_internal_table_name_length. I suggest to:
- explicitly clarify what means each of them, using a bullet list
- let's name them
auxilary_max...andinternal_max.... Let's put adjectives next to single words, not to allow confusion (it could be interpreted as refering to some auxilary table or internal table)
There was a problem hiding this comment.
max_internal_table_name_length was a typo - it was another name I considered for the max_auxiliary_table_name_length. I'll fix the explanation.
| // specifies that table names "names must be between 3 and 255 characters long | ||
| // and can contain only the following characters: a-z, A-Z, 0-9, _ (underscore), - (dash), . (dot) | ||
| // validate_table_name throws the appropriate api_error if this validation fails. | ||
| // However, Alternator only allows max_table_name_length characters, not 255. |
There was a problem hiding this comment.
I suggest to repeat here, that auxiliary tables such as materialized views or CDC log tables have different limit.
There was a problem hiding this comment.
This comment is specifically about the dynamoDB "TableName", not anything else (not IndexName, not anything internal in Scylla). I'll think on how to rephrase this comment.
| } | ||
| } | ||
|
|
||
| // Validate that a CDC log table could be created for the given table name, |
There was a problem hiding this comment.
Let's change for into with. With for I was wondering for a second if you mean here the base table name.
There was a problem hiding this comment.
You're right. I'll fix the comment. It was misleading in more than just this first sentense.
There was a problem hiding this comment.
Hmm, you confused me ;-) Turns out the function really takes the base table name, NOT the log table name, so the comment was correct but clearly managed to confuse you (and now me). I'll clarify.
| // Note that if max_table_name_length is set to less than 207 (which is | ||
| // max_auxilliary_table_name_length-15), then this function will never | ||
| // fail. However, it's still important to call it in UpdateTable, in case | ||
| // we have pre-existing tables with names longer than this to avoid #24598. |
There was a problem hiding this comment.
No, that suggestion is incorrect - the point is that you can have a pre-existing table (not CDC) with name length, say, 221, and then you may enable CDC on it - and the new CDC's log name won't fit. So we still need to have this function - just to allow pre-existing tables.
There was a problem hiding this comment.
So let's rename the variable table_name below into base_table_name so no one will repeat my mistake.
| // length. To provide a more helpful error message, we assume that | ||
| // cdc::log_name() always adds a suffix of the same length. | ||
| int suffix_len = cdc::log_name(table_name).length() - table_name.length(); | ||
| throw api_error::validation(fmt::format("Streams cannot be added if the table name is longer than {} characters.", |
There was a problem hiding this comment.
Let's print the offending table name as well.
Also, let's clarify that it's the streams table name we're referring to.
There was a problem hiding this comment.
Added the table's name. The comment about "it's the streams table" is incorrect (more on that above).
| std::optional<data_dictionary::table> tbl = p.local().data_dictionary().try_find_table(keyspace_name, table_name); | ||
| if (!tbl) { | ||
| // DynamoDB returns validation error even when table does not exist | ||
| // and the table name is invalid. |
There was a problem hiding this comment.
and -> or ?
table name -> table name provided in the request
There was a problem hiding this comment.
No, "and" is correct. The point is that if the table does NOT exist, then DynamoDB still validates the name, and can print an error if it's., say, too long. It didn't have to be this way - DynamoDB could have just told you the table doesn't exist instead of doing the name length checks, but they did so we do too.
And the phrase "table name" refers to table_name in the line the comment is explaining, isn't this self-explanatory? What other "table name" can this be about?
There was a problem hiding this comment.
What other "table name" can this be about?
You're right, it's self-evident from the call validate_table_name(table_name); which table name is meant.
| } catch(data_dictionary::no_such_column_family&) { | ||
| // DynamoDB returns validation error even when table does not exist | ||
| // and the table name is invalid. | ||
| validate_table_name(table_name); |
There was a problem hiding this comment.
and -> or ?
table name -> table name provided in the request
Also, is there any risk of searching for a CDC table name without making sure first that the requested name contains only legal characters?
There was a problem hiding this comment.
I don't think there's any risk - find_schema() more or less looks up something in a hash table, and if it's not legal characters, that silly name will simply not be found.
🟢 CI State: SUCCESS
Build Details:
|
In commit d8c3b14 we fixed scylladb#12538: That issue noted that most requests which take a TableName don't need to "validate" the table's name (check that it has allowed characters and length) if the table is found in the schema. We only need to do this validation on CreateTable, or when the table is *not* found (because in that case, DynamoDB chose to print a validation error instead of table-not-found error). It turns out that the fix missed a couple of places where the name validation was unnecessary, so this patch fixes those remaining places. The original motivation for fixing was scylladb#12538 was performance, so it focused just one cheap common requests. But now, we want to be sure we fixed *all* requests, because of a new motivation: We are considering, due to scylladb#24598, to lower the maximum allowed table name length. However, when we'll do that, we'll want the new lower length limit to not apply to already existing tables. For example, it should be possible to delete a pre-existing table with DeleteTable, if it exists, without the command complaining that the name of this table is too long. So it's important to make sure that the table's name is only validated in CreateTable or if the table does not exist. Signed-off-by: Nadav Har'El <[email protected]>
Alternator has a constant, max_table_name_length=222, which is currently used for two different things: 1. Limiting the length of the name allowed for Alternator table. 2. Limiting the length of some auxiliary tables the user is not aware of, such as a materialized view (whose name is tablename:indexname) or (in the next patch) CDC log table. In principle, there is no reason why these two limits need to be identical - we could lower the table name limit to, say, 192, but still allow the tablename:indexname to be even longer, up to 222 - i.e., allow creating materialized views even on tables whose name has 192 characters. So in this patch we split this variable into two, max_table_name_length and max_auxiliary_table_name_length. At the moment, both are still set to the same value - 222. In a following patch we plan to lower max_table_name_length but leave max_auxiliary_table_name_length at 222. Signed-off-by: Nadav Har'El <[email protected]>
Currently, in Alternator it is possible to create a table whose name has 222 characters, and then trying to add Streams to that table results in an attempt to create a CDC log table with the same name plus a 15-character suffix "_scylla_cdc_log", which resulted (Ref scylladb#24598) in an IO-error and a Scylla shutdown. This patch adds code to the Stream-adding operations (both CreateTable and UpdateTable) that validates that the table's name, plus that 15 character suffix, doesn't exceed max_auxiliary_table_name_length, i.e., 222. After this patch, if you have a table whose name is between 207 and 222 characters, attempting to enable Streams on it will fail with: "Streams cannot be added if the table name is longer than 207 characters." Note that in the future, if we lower max_table_name_length to below 207, e.g., to 192, then it will always be possible to add a stream to any legal table, and the new checks we had here will be mostly redundant. But only "mostly" - not entirely: Checking in UpdateTable is still important because of the possibility that an upgrading user might have a pre-existing table whose name is longer than the new limit, and might try to enable Streams. After this patch, the crash reported in scylladb#24598 can no longer happen, so in this sense the bug is solved. However, we still want to lower max_table_name_length from 222 to 192, so that it will always be possible to enable streams on any table with a legal name length. We'll do this in the next patch. Signed-off-by: Nadav Har'El <[email protected]>
Currently, Alternator allows creating a table with a name up to 222 (max_table_name_length) characters in length. But if you do create a table with such a long name, you can have some difficulties later: You you will not be able to add Streams or GSI or LSI to that table, because 222 is also the absolute maximum length Scylla tables can have and the auxilliary tables we want to create (CDC log, materialized views) will go over this absolute limit (max_auxiliary_table_name_length). This is not nice. DynamoDB users assume that after successfully creating a table, they can later - perhaps much later - decide to add Streams or GSI to it, and today if they chose extremely long names, they won't be able to do this. So in this patch, we lower max_table_name_length from 222 to 192. A user will not be able to create tables with longer names, but the good news is that once successfully creating a table, it will always be possible to enable Streams on it (the CDC log table has an extra 15 bytes in its name, and 192 + 15 is less than 222), and it will be possible to add GSIs with short enough names (if the GSI name is 29 or less, 192 + 29 + 1 = 222). This patch is a trivial one-line code change, but also includes the corrected documentation of the limits, and a fix for one test that previously checked that a table name with length 222 was allowed - and now needs to check 192 because 222 is no longer allowed. Note that if a user has existing tables and upgrades Scylla, it is possible that some pre-existing Alternator tables might have lengths over 192 (up to 222). This is fine - in the previous patches we made sure that even in this case, all operations will still work correctly on these old tables (by not not validating the name!), and we also made sure that attempting to enable Streams may fail when the name is too long (we do not remove those old checks in this patch, and don't plan to remove them in the forseeable future). Note that the limit we chose - 192 characters - is identical to the table name limit we recently chose in CQL. It's nicer that we don't need to memorize two different limits for Alternator and CQL. Signed-off-by: Nadav Har'El <[email protected]>
|
Pushed a new version with cosmetic improvements suggested in @ScyllaPiotr 's review and no functional changes. |
🟢 CI State: SUCCESS
Build Details:
|
| bool = false); | ||
|
|
||
| static void add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp); | ||
| static bool add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp); |
There was a problem hiding this comment.
Sorry for not noticing it before. Let's explain what is the meaning of the returned value (it's the intended state of the Stream and not whether the change was successful).
ScyllaPiotr
left a comment
There was a problem hiding this comment.
Approving again, for clarity.
|
@scylladb/scylla-maint please consider merging. There are two approvals and tests passed. |
Before this series, it is possible to crash Scylla (due to an I/O error) by creating an Alternator table close to the maximum name length of 222, and then enabling Alternator Streams. This series fixes this bug in two ways:
222 - strlen("_scylla_cdc_log")) - for such tables enabling Streams will fail, but no longer crash.No need to backport, Alternator Streams is still an experimental feature and this patch just improves the unlikely situation of extremely long table names.
Fixes #24598