CASSANDRA-20389 prevent too long table names not fitting file names#4038
CASSANDRA-20389 prevent too long table names not fitting file names#4038k-rus wants to merge 3 commits intoapache:trunkfrom
Conversation
pkolaczk
left a comment
There was a problem hiding this comment.
LGTM, just a few minor code-style nitpicks ;)
|
@smiklosovic @pkolaczk I pushed a commit to separate name length validations from the validation for emptiness and characters. The rational is:
What do you think about the last implementation? |
5f8ac66 to
6fc8152
Compare
|
I updated the PR title to include the latest changes. |
|
i wont be available whole next week |
|
@k-rus see? I love the current approach! I will try to review this, definitely after 5th May, maybe next week but as I said availability will be limited. You are free to ask another commiter for the second review if you want. edit: I put some comments in. |
I confirm that I understand that you will have possibility to review after next week. I think it's better to wait until you are back. Meanwhile I addressed some suggestions and replied to another. |
| if (!isValidName(name)) | ||
| except("Table name must not be empty or not contain non-alphanumeric-underscore characters (got \"%s\")", name); | ||
|
|
||
| if (name.length() > TABLE_NAME_LENGTH) |
There was a problem hiding this comment.
this is a little bit off, I would expect that if isValidTableName is true, then I will not be required to have yet another check where I check if name.length() > TABLE_NAME_LENGTH. That might be part of that already no?
What might happen is that a dev calls "isValidTableName(name)" and if it returns true, why would they check if length is smaller than TABLE_NAME_LENGTH. They just forget to do this. Because they just got "true" from "isValidTableName" .
In other words, by having the logic spread here like this, "isValidName" is not actually telling truth, does it?
There was a problem hiding this comment.
These two checks and the assert are replaced with a single method call to validateTableName, which also encapsulates throwing an exception on invalid table name.
There was a problem hiding this comment.
ping @smiklosovic , do you agree with @k-rus 's fix ? changing the name from "!isValidName(name)" to "isValidCharsName"
| if (name.length() > TABLE_NAME_LENGTH) | ||
| except("Table name must not be more than %d characters long (got %d characters for \"%s\")", TABLE_NAME_LENGTH, name.length(), name); | ||
|
|
||
| assert getTableDirectoryName().length() <= FILENAME_LENGTH : String.format("Generated directory name for a table of %d characters doesn't fit the max filename legnth of %s. This unexpectedly wasn't prevented by check of the table name length, %d, to fit %d characters (got table name \"%s\" and generated directory name \"%s\"", getTableDirectoryName().length(), FILENAME_LENGTH, name.length(), TABLE_NAME_LENGTH, name, getTableDirectoryName()); |
There was a problem hiding this comment.
same with this. If I got true from "isValidTableName" there is no way I would go and check this as well.
That means what we would move three "get" methods bellow to SchemaConstants as well and accommodate their signatures to reflect the changes.
There was a problem hiding this comment.
I disagree with the suggestion of moving getTableDirectoryName into the utility class. As it operate on a table name and used with TableMetadata instance.
a8f20f7 to
85e976b
Compare
Currently the table name length is limited to 48 characters, same way as all the other names. This was compatible with Cassandra 3 but due to a bug the validation of this constraint was removed in Cassandra 4 and 5 (see CASSANDRA-20389). Nowadays, some of the usage scenarios depends on the lack of limitation in Cassandra, such as some feature store workflows generating long table names. This path introduces the behavior that is compatible with Cassandra 4 and 5. It limits table name to 222 characters becasue when the existing code creates a directory to store the new table, this directory's name is created by sstables::init_table_storage(), which takes the table's full name, a dash (-), and a 32-byte UUID string. Because most Linux filesystems limit filename components to 255 bytes, this means that any table name longer than 222 bytes will attempt to mkdir() a directory name longer than the allowed 255 bytes, and fail. Exactly the same limit (222 characters) is proposed in the patch fixing directory creation failures in Cassandra 4 and 5 (see github.com/apache/cassandra/pull/4038, SchemaConstants.java).
Currently the name length of tables, materialized views, keyspaces and indexes is limited to 48 characters. This was compatible with Cassandra 3 but due to a bug the validation of this constraints was removed in Cassandra 4 and 5 (see CASSANDRA-20389). Nowadays, some of the usage scenarios depends on the lack of limitation in Cassandra, such as some feature store workflows generating long table names. This path introduces the behavior that is compatible with Cassandra 4 and 5. It limits names to 218 characters because when the existing code creates a directory to store the new table, this directory's name is created by sstables::init_table_storage(), which takes the table's full name, a dash (-), and a 32-byte UUID string. Because most Linux filesystems limit filename components to 255 bytes, this means that any table name longer than 222 bytes will attempt to mkdir() a directory name longer than the allowed 255 bytes, and fail. Extra 4 characters are reserved for _cdc suffix for CDC-generated tables making the limit equal 218 chars. Similar limit (222 characters) is proposed in the patch fixing directory creation failures in Cassandra 4 and 5 (see github.com/apache/cassandra/pull/4038, SchemaConstants.java). The patch also fixes tests that were broken by introducing this patch.
Currently the name length of tables, materialized views, keyspaces and indexes is limited to 48 characters. This was compatible with Cassandra 3 but due to a bug the validation of this constraints was removed in Cassandra 4 and 5 (see CASSANDRA-20389). Nowadays, some of the usage scenarios depends on the lack of limitation in Cassandra, such as some feature store workflows generating long table names. This path introduces the behavior that is compatible with Cassandra 4 and 5. It limits names to 218 characters because when the existing code creates a directory to store the new table, this directory's name is created by sstables::init_table_storage(), which takes the table's full name, a dash (-), and a 32-byte UUID string. Because most Linux filesystems limit filename components to 255 bytes, this means that any table name longer than 222 bytes will attempt to mkdir() a directory name longer than the allowed 255 bytes, and fail. Extra 4 characters are reserved for _cdc suffix for CDC-generated tables making the limit equal 218 chars. Similar limit (222 characters) is proposed in the patch fixing directory creation failures in Cassandra 4 and 5 (see github.com/apache/cassandra/pull/4038, SchemaConstants.java). The patch also fixes tests that were broken by introducing this patch.
Currently the name length of tables, materialized views, keyspaces and indexes is limited to 48 characters. This was compatible with Cassandra 3 but due to a bug the validation of this constraints was removed in Cassandra 4 and 5 (see CASSANDRA-20389). Nowadays, some of the usage scenarios depends on the lack of limitation in Cassandra, such as some feature store workflows generating long table names. This path introduces the behavior that is compatible with Cassandra 4 and 5. It limits names to 218 characters because when the existing code creates a directory to store the new table, this directory's name is created by sstables::init_table_storage(), which takes the table's full name, a dash (-), and a 32-byte UUID string. Because most Linux filesystems limit filename components to 255 bytes, this means that any table name longer than 222 bytes will attempt to mkdir() a directory name longer than the allowed 255 bytes, and fail. Extra 4 characters are reserved for _cdc suffix for CDC-generated tables making the limit equal 218 chars. Similar limit (222 characters) is proposed in the patch fixing directory creation failures in Cassandra 4 and 5 (see github.com/apache/cassandra/pull/4038, SchemaConstants.java). The patch also fixes tests that were broken by introducing this patch.
|
@beobal would you mind take a look ? |
b906eb3 to
d9941e7
Compare
|
I am + 1 if all my comments are addressed and ci is green . But i think it is better if @beobal or @ekaterinadimitrova2 take another look. |
Thank you for the review and finding my implementation for the ticket suitable as all the comments are not related to the main contribution. I hope it will be possible to find a reasonable way to deliver those changes as I find important to deliver them.
As new reviewers might disagree with the current main implementation for the ticket I will wait with running an extensive CI. |
I can think that adding VScode to ignore can be done in a separate ticket and PR, but it will be blocker for this PR until it's committed to trunk and release branches. Other changes do not make much sense outside this PR and will be difficult to deliver, so I still think that the separate commit is a good tradeoff. |
I don't mind reviewing, but we already have three committers doing a review here. Plus @pkolaczk, who is also a long-time contributor. Is there anything specific you are worried about and you want to discuss? |
812c2e1 to
31c8233
Compare
|
I ported this patch to 5.0, see #4325. It failed a test on the error message for keyspace of @pkolaczk @netudima @Maxwell-Guo |
Actually I found another bug in this implementation. |
7efcb08 to
c9321b5
Compare
|
Now I fixed that the keyspace validation throws the same exceptions as before this PR. See c9321b57c4fe078561cf9bcbed15dd2612277c6d Unfortunately I don't find a good way to add tests on the error message in such case, since it's about internal calls as normally keyspaces are not created in the same API call for creating tables. Also some formatting doesn't look great. I have double checked that this is in accordance with The result of running CI doesn't show any related failures: http://ci-cassandra.infra.datastax.com/job/cassandra/25/testReport/ @pkolaczk @netudima @Maxwell-Guo do you want to review the new commit? c9321b57c4fe078561cf9bcbed15dd2612277c6d (the commit will be squashed after I am sure that the work is accepted) |
a71489f to
f53e6cf
Compare
|
While porting this patch to 4.1 I noticed one more place where I missed to throw exception on keyspace validation error. As result I refactored this PR to throw always from inside the keyspace validation and not rely on passed exception throwers from callers. See this commit: f53e6cfc798eb038bdf71539e9a0b945b876baf5 I plan to port this commit to 5.0 and 4.1 work, and will squash commits here. |
pkolaczk
left a comment
There was a problem hiding this comment.
Makes sense.
Yes, moving throw to inside validate is much cleaner and less fragile than before.
f53e6cf to
872fcc6
Compare
Includes: - Simplify passing methods as lambdas. - Use chars instead of single char strings. - Simplify assertion method calls. - Remove unnecessary method throws declarations. - Add VScode local folder to gitignore. - Fix typo in a comment
Fixes CASSANDRA-20389 The length of table names was not controlled. This is likely due to confusion between validation methods with similar names. As result creating tables with too long names led to the too long file name exceptions during table creations. This commit adds a validation of table name lengths to avoid the too long file name errors. The validation length is based on how the table name is used to create file/directory names, and needs to be exact to prevent the too long file name exception, but allow all other table names, which didn't lead to the too long file name exception. This length limit is different from the existing name length limit of 48 characters used by common validation functions. Thus, this commit moves out the length validation from the validation methods into a separate length validation method, so the errors on names are more specific. The non-length validation methods combined into a single method, which checks for empty names and valid characters. New constants are added for the length limits. Table name related code are moved into methods in TableMetadata class, so their semantics are more clear and to allow reuse, e.g., in asserting the table name length constant. Tests are added for the long table names and non-alphanumeric names. Keyspace name validation function is now shared between two classes and a unit test of it is added. Patch by Ruslan Fomkin; reviewed by Piotr Kołaczkowski, Dmitry Konstantinov, Maxwell Guo for CASSANDRA-20389
872fcc6 to
150b528
Compare
|
merged |
The length of table names was not controlled. This is likely due to confusion between validation functions with similar names. As result creating a table with too long name led to the too long file name exception during the table creation.
This commit adds a validation of table name lengths to avoid the too long file name errors. The validation length is based on how the table name is used to create file/directory name and needs to be exact to prevent the too long file name exception, but allow all other table names, which don't lead to the too long file name exception. Since this limit is different from the existing name length limit of 48 characters, the validation of lengths is separated from the common validation function, which checks for empty name and valid characters. This separation also allows to be specific with errors in number of cases.
Note that there will be more PRs with more cases for different name length:
New constants are added for the table name length limit and for the file name length limit.
Two similar validation functions are replaced with one to avoid confusion. Name related constants are stored in the same utility class.
Table name related code are moved into methods in
TableMetadataclass, so their semantics are more clear and to allow reuse, e.g., in asserting the table name length constant.Tests are added for the long table names and non-alphanumeric names.
Keyspace name validation function is shared between two classes and a unit test of it is added.
Other changes in this commit:
patch by Ruslan Fomkin; reviewed by Piotr Kołaczkowski, Dmitry Konstantinov, Maxwell Guo for CASSANDRA-20389