Skip to content

cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 192 bytes#24500

Merged
scylladb-promoter merged 2 commits intoscylladb:masterfrom
knowack1:4480-allow-unlimited-table-name-length
Jun 22, 2025
Merged

cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 192 bytes#24500
scylladb-promoter merged 2 commits intoscylladb:masterfrom
knowack1:4480-allow-unlimited-table-name-length

Conversation

@knowack1
Copy link
Copy Markdown
Contributor

@knowack1 knowack1 commented Jun 12, 2025

cql, schema: Extend name length limit from 48 to 192 bytes

This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.

The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
The directory name for this log table becomes the longest possible representation.
Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
  255 bytes (common filesystem limit for a path component)
-  32 bytes (for the 32-character UUID string)
-   1 byte  (for the '-' separator)
-  15 bytes (for the '_scylla_cdc_log' suffix)
-  15 bytes (reserved for future use)
----------
= 192 bytes (Maximum allowed name length)
This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).

This patch also updates/adds all associated tests to validate the new 192-byte limit.
The documentation has been updated accordingly.

Fixes #4480

Backport 2025.2: The significantly shorter maximum table name length in Scylla compared to Cassandra is becoming a more common issue for users in the latest release.

@knowack1 knowack1 requested a review from ScyllaPiotr June 12, 2025 16:05
@knowack1 knowack1 force-pushed the 4480-allow-unlimited-table-name-length branch 2 times, most recently from fd6d7d0 to d953be7 Compare June 12, 2025 16:22
Comment thread test/cqlpy/cassandra_tests/validation/operations/create_test.py Outdated
@ScyllaPiotr
Copy link
Copy Markdown
Contributor

Congrats on your 1st PR! 👏 A couple of general comments from me.

It's not customary to put issue number into the title of the PR, but rather a module name or scope name.

Also, following my suggestion in the issue thread, let's rename the PR accordingly.

Furthermore, you write in the cover letter, that

some usage scenarios now depend on this relaxed constraint

Could you name a few such scenarios, just for clarity? Or should it be changed into some usage scenarios could benefit from relaxing this constraint?

Otherwise, congrats on the cover letter, it's very communicative!

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

Also, a PR requires a backport label. This current one IMO fits into backport/none. Please set it.
What's more, the cover letter seems to contain a large part formatted as code, please remove the formatting.
Also, the cover letter should be ending with a statement about backporting. The relevant section shows up in the cover letter template when you create a PR. Please see here for an example of a good line on backport.

@knowack1 knowack1 changed the title 4480 allow unlimited table name length cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 207 bytes #4480 Jun 13, 2025
@knowack1 knowack1 changed the title cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 207 bytes #4480 cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 207 bytes Jun 13, 2025
@knowack1 knowack1 force-pushed the 4480-allow-unlimited-table-name-length branch from d953be7 to c6df135 Compare June 13, 2025 07:57
@knowack1 knowack1 force-pushed the 4480-allow-unlimited-table-name-length branch 2 times, most recently from 6212bdc to 4fe29ff Compare June 13, 2025 08:35
@knowack1
Copy link
Copy Markdown
Contributor Author

It's not customary to put issue number into the title of the PR, but rather a module name or scope name.

Fixed

Could you name a few such scenarios, just for clarity? Or should it be changed into some usage scenarios could benefit from relaxing this constraint?

Done

@knowack1
Copy link
Copy Markdown
Contributor Author

Also, a PR requires a backport label. This current one IMO fits into backport/none. Please set it.

Set backport/2025.2 as @swasik did in #24295

What's more, the cover letter seems to contain a large part formatted as code, please remove the formatting.

Fixed

Also, the cover letter should be ending with a statement about backporting.

Done in pull request description only.

Copy link
Copy Markdown
Contributor

@ScyllaPiotr ScyllaPiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice suite of tests and corrections, congrats!
Below you'll find some questions and remarks.

Comment thread test/cqlpy/util.py Outdated
Comment thread test/cqlpy/util.py Outdated
Comment thread test/cqlpy/util.py Outdated
Comment thread test/cqlpy/test_name.py
Comment thread test/cqlpy/test_name.py
Comment thread test/cqlpy/test_name.py
Comment thread test/cqlpy/test_name.py Outdated
Comment thread schema/schema.hh Outdated
Comment thread test/cqlpy/test_keyspace.py
Comment thread test/cqlpy/test_name.py
@ScyllaPiotr
Copy link
Copy Markdown
Contributor

Also, a PR requires a backport label. This current one IMO fits into backport/none. Please set it.

Set backport/2025.2 as @swasik did in #24295

@swasik @ewienik do you intend to backport VS to 2025.2? The current justification for backport is: needed to make ScyllaDB work with some of the AI workflows.

@ewienik
Copy link
Copy Markdown
Contributor

ewienik commented Jun 13, 2025

Also, a PR requires a backport label. This current one IMO fits into backport/none. Please set it.

Set backport/2025.2 as @swasik did in #24295

@swasik @ewienik do you intend to backport VS to 2025.2? The current justification for backport is: needed to make ScyllaDB work with some of the AI workflows.

Vector Search is a new feature, should not be backported. This issue is not a part of Vector Search, I assume there are other AI worflows which needs to be supported in previous versions.

@knowack1 knowack1 force-pushed the 4480-allow-unlimited-table-name-length branch 3 times, most recently from ec85f67 to 5caebf0 Compare June 13, 2025 15:07
Comment thread docs/reference/limits.rst Outdated
Comment thread schema/schema.hh Outdated
Comment thread test/cqlpy/cassandra_tests/validation/operations/create_test.py Outdated
Comment thread test/cqlpy/test_keyspace.py
Comment thread test/cqlpy/test_name.py Outdated
Comment thread test/cqlpy/test_name.py Outdated
Comment thread test/cqlpy/test_name.py Outdated
Comment thread test/cqlpy/test_name.py Outdated
Comment thread test/cqlpy/test_name.py Outdated
Comment thread test/cqlpy/test_name.py Outdated
@knowack1 knowack1 force-pushed the 4480-allow-unlimited-table-name-length branch 7 times, most recently from 094892c to 43721df Compare June 17, 2025 07:01
@knowack1
Copy link
Copy Markdown
Contributor Author

@knowack1 unfortunately some dtests failed on tests like:

assert_invalid(session, "CREATE KEYSPACE My_much_much_too_long_identifier_that_should_not_work WITH replication = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1 }")

You'll need to send a PR to the dtest repository to fix those tests before we can commit this PR. I propose that the fix should be to outright delete the affected tests (or just specific checks in a bigger test) - these tests have nothing to do with distributed tests, and should be (and are) in-tree single-node function tests in test/cqlpy.

@nyh PR prepared https://github.com/scylladb/scylla-dtest/pull/5975. Is it possible to rerun CI on this PR using dtest from https://github.com/scylladb/scylla-dtest/pull/5975 branch?
I was able to run dtests locally with my Scylla version and failing test passes.

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Jun 18, 2025

You'll need to send a PR to the dtest repository to fix those tests before we can commit this PR. I propose that the fix should be to outright delete the affected tests (or just specific checks in a bigger test) - these tests have nothing to do with distributed tests, and should be (and are) in-tree single-node function tests in test/cqlpy.

@nyh PR prepared scylladb/scylla-dtest#5975. Is it possible to rerun CI on this PR using dtest from scylladb/scylla-dtest#5975 branch? I was able to run dtests locally with my Scylla version and failing test passes.

Thanks. I merged your dtest PR. I propose that we just wait until that change promotes in the dtest repository, and then you can rerun this CI without any special trickery. It's possible to tell CI to use a different branch of dtest, but I don't want to do that, because if I do that and merge this PR - then the "next promotion" stage, which also runs dtests, will fail. Let's just do it the slow and sure way.

@avikivity
Copy link
Copy Markdown
Member

What's the new Cassandra limit?
Increasing the limit all the way to the maximum doesn't leave us any reserve. I don't see how we'd need the reserve, and for sure we can work around it with a more complicated directory structure, but no one will mind 207 vs 194 or whatever.

Changed to 192 bytes. Now we have 15 bytes reserved for future use.

15 whole bytes!

Ok.

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🔴 CI State: FAILURE

✅ - Framework test
✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/cql_query_test
❌ - dtest with tablets
❌ - dtest with gossip topology changes
❌ - dtest with consistent topology changes
❌ - Unit Tests

Failed Tests (3/41995):

Build Details:

  • Duration: 12 hr
  • Builder: spider6.cloudius-systems.com

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

❌ - Framework test

Build Details:

  • Duration: 3 min 0 sec
  • Builder: spider6.cloudius-systems.com

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

ScyllaPiotr commented Jun 20, 2025

❌ - Framework test

So how the framework test failed. This and this show only invalid internal status, try resetting the pause process with \"podman system migrate\": could not find any running process: no such process, like some component never launched.

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

✅ - Framework test
✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/cql_query_test
✅ - dtest with tablets
✅ - dtest with gossip topology changes
✅ - dtest with consistent topology changes
✅ - Unit Tests

Build Details:

  • Duration: 7 hr 3 min
  • Builder: spider4.cloudius-systems.com

@knowack1 knowack1 requested a review from nyh June 20, 2025 15:19
@knowack1
Copy link
Copy Markdown
Contributor Author

After manually triggered rebuild on Jenkins dtest related jobs with next branch of scylla-dtest repo, jobs passed

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Jun 22, 2025

Changed to 192 bytes. Now we have 15 bytes reserved for future use.

15 whole bytes!

Ok.

@avikivity can you please be more explicit if you think this number 192 is ok, or not really ok?

Note that the real filesystem limit is 222, so we actually left 30 bytes "spare". CDC uses 15 out of these spare, but it's not additive - if some other feature, e.g., Paxos or Feature X needs to create its own table, it can also use the whole extra 30 bytes in the new table's name and it doesn't care that CDC used a 15 byte suffix.

@avikivity
Copy link
Copy Markdown
Member

Changed to 192 bytes. Now we have 15 bytes reserved for future use.

15 whole bytes!
Ok.

@avikivity can you please be more explicit if you think this number 192 is ok, or not really ok?

It's ok.

Note that the real filesystem limit is 222, so we actually left 30 bytes "spare". CDC uses 15 out of these spare, but it's not additive - if some other feature, e.g., Paxos or Feature X needs to create its own table, it can also use the whole extra 30 bytes in the new table's name and it doesn't care that CDC used a 15 byte suffix.

Right.

nyh added a commit that referenced this pull request Jun 22, 2025
…h limit from 48 to 192 bytes' from Karol Nowacki

    cql, schema: Extend name length limit from 48 to 192 bytes

    This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
    The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
    and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
    This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.

    The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
    When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
    For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
    The directory name for this log table becomes the longest possible representation.
    Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
    To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
      255 bytes (common filesystem limit for a path component)
    -  32 bytes (for the 32-character UUID string)
    -   1 byte  (for the '-' separator)
    -  15 bytes (for the '_scylla_cdc_log' suffix)
    -  15 bytes (reserved for future use)
    ----------
    = 192 bytes (Maximum allowed name length)
    This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).

    This patch also updates/adds all associated tests to validate the new 192-byte limit.
    The documentation has been updated accordingly.

Fixes #4480

Backport 2025.2: The significantly shorter maximum table name length in Scylla compared to Cassandra is becoming a more common issue for users in the latest release.

Closes #24500

* github.com:scylladb/scylladb:
  cql, schema: Extend name length limit from 48 to 192 bytes
  replica: Remove unused keyspace::init_storage()
@scylladb-promoter scylladb-promoter merged commit 85c19d2 into scylladb:master Jun 22, 2025
31 checks passed
@scylladbbot
Copy link
Copy Markdown

⚠️ @knowack1 you have been added as collaborator to scylladbbot fork
Please check your inbox and approve the invitation, otherwise you will not be able to edit PR branch when needed

@scylladbbot
Copy link
Copy Markdown

⚠️ @knowack1 you have been added as collaborator to scylladbbot fork
Please check your inbox and approve the invitation, otherwise you will not be able to edit PR branch when needed

@swasik
Copy link
Copy Markdown
Contributor

swasik commented Jun 23, 2025

Also, a PR requires a backport label. This current one IMO fits into backport/none. Please set it.

Set backport/2025.2 as @swasik did in #24295

@swasik @ewienik do you intend to backport VS to 2025.2? The current justification for backport is: needed to make ScyllaDB work with some of the AI workflows.

Vector Search is a new feature, should not be backported. This issue is not a part of Vector Search, I assume there are other AI worflows which needs to be supported in previous versions.

Yes, it was for external framework that allows implementing feature stores using various DBs: https://github.com/featureform/featureform - we want to have working integration ASAP and that is why the backport is needed.

nyh added a commit that referenced this pull request Jun 23, 2025
…indexes name length limit from 48 to 192 bytes' from Scylladb[bot]

    cql, schema: Extend name length limit from 48 to 192 bytes

    This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
    The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
    and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
    This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.

    The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
    When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
    For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
    The directory name for this log table becomes the longest possible representation.
    Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
    To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
      255 bytes (common filesystem limit for a path component)
    -  32 bytes (for the 32-character UUID string)
    -   1 byte  (for the '-' separator)
    -  15 bytes (for the '_scylla_cdc_log' suffix)
    -  15 bytes (reserved for future use)
    ----------
    = 192 bytes (Maximum allowed name length)
    This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).

    This patch also updates/adds all associated tests to validate the new 192-byte limit.
    The documentation has been updated accordingly.

Fixes #4480

Backport 2025.2: The significantly shorter maximum table name length in Scylla compared to Cassandra is becoming a more common issue for users in the latest release.

- (cherry picked from commit a41c12c)

- (cherry picked from commit 4577c66)

Parent PR: #24500

Closes #24603

* github.com:scylladb/scylladb:
  cql, schema: Extend name length limit from 48 to 192 bytes
  replica: Remove unused keyspace::init_storage()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow longer table name length

8 participants