Escape index filenames to prevent broken parts by Algunenano · Pull Request #94079 · ClickHouse/ClickHouse

Algunenano · 2026-01-13T16:59:50Z

Changelog category (leave one):

Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Escape index filenames to prevent broken parts. With this change ClickHouse will fail to load indices with non-ascii characters in their name created by previous versions. To handle it you can use the merge tree setting escape_index_filenames.

Queries won't fail if indices cannot be loaded, you will see a Debug message like:

File for index `minmax_index_ÑÑÑÑÑÑ` does not exist (./store/019/0199d890-b99b-4c29-97d1-b037a85fe65d/all_2_2_0/skp_idx_minmax_index_%C3%91%C3%91%C3%91%C3%91%C3%91%C3%91.*). Skipping it.

This message is the same as when the index is not materialized for a part. To workaround this change and be able to load old indices you can alter the table settings (alter table test modify setting escape_index_filenames=0).

Closes #91417

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-01-13T17:00:28Z

Workflow [PR], commit [8b3c8fb]

Summary: ❌

job_name	test_name	status	info	comment
Integration tests (amd_tsan, 3/6)		failure
	test_restore_db_replica/test.py::test_query_after_restore_db_replica[alter table-no exists table-no restart]	FAIL	cidb	IGNORED
BuzzHouse (amd_debug)		failure
	Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa)	FAIL	cidb, issue	ISSUE EXISTS
Stateless tests (arm_asan, targeted)		error		IGNORED
BuzzHouse (amd_ubsan)		error		IGNORED

Copilot

Pull request overview

This PR fixes a critical issue where MergeTree indices with non-ASCII characters in their names could create broken data parts. The fix applies proper filename escaping to prevent filesystem-related issues with special characters.

Changes:

Centralized filename escaping for index names by moving escapeForFileName logic into IMergeTreeIndex::getFileName()
Updated mutation and size calculation code to use the escaped filenames consistently
Added a regression test with non-UTF8 and special characters in index names

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/Storages/MergeTree/MergeTreeIndices.h	Changed `getFileName()` from inline to a separate implementation and updated documentation
src/Storages/MergeTree/MergeTreeIndices.cpp	Implemented `getFileName()` with proper filename escaping using `escapeForFileName`
src/Storages/MergeTree/IMergeTreeDataPart.cpp	Removed redundant escaping since it's now handled in `getFileName()` and added escaping to `hasSecondaryIndex()`
src/Storages/MergeTree/MutateTask.cpp	Updated to use `escapeForFileName` for index filename construction during mutations
tests/queries/0_stateless/03789_skip_index_bad_utf8.sql	Regression test with non-ASCII characters in index names to verify attach/detach works correctly
tests/queries/0_stateless/03789_skip_index_bad_utf8.reference	Expected output for the regression test

src/Storages/MergeTree/MutateTask.cpp

Copilot

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.

src/Core/SettingsChangesHistory.cpp

tests/integration/test_index_filename_upgrade/test.py

src/Storages/MergeTree/MergeTreeIndices.cpp

Co-authored-by: Copilot <[email protected]>

Algunenano · 2026-01-16T12:28:58Z

Added a compatibility setting so indices keep working after upgrading (as long as you set compatibility or escape_index_filenames correctly)

nikitamikhaylov · 2026-01-21T14:13:34Z

tests/integration/test_index_filename_upgrade/test.py

+    node.restart_with_latest_version()
+
+    node.query("SELECT count() FROM test_index_filename WHERE `column` = 24 SETTINGS use_query_condition_cache=0;")
+    node.wait_for_log_line("File for index `minmax_index_ESPAÑA` does not exist")


Looks like the test is flaky because of this line. Why? Does it fail to discard the index?

Having a look. Thanks!

The problem was that when running the test multiple times the test framework starts with the version it was left (master), not with the original declared in the cluster (25.12).

nikitamikhaylov · 2026-01-21T14:17:12Z

src/Core/SettingsChangesHistory.cpp

        {
            {"min_columns_to_activate_adaptive_write_buffer", 500, 500, "New setting"},
            {"materialize_statistics_on_merge", true, true, "New setting"},
+            {"escape_index_filenames", false, true, "Escape non-ascii characters in filenames created for indices"},


I was thinking maybe we can have a better way how to preserve a compatibility e.g. automatically fallback to non-escaped file name in case we didn't find an escaped one on the disk, but no. You solution looks better. JFYI

I thought the same thing, then I started doing it and it required too many changes in many disconnected places

Escape index filenames to prevent broken parts

725df5d

clickhouse-gh bot added the pr-backward-incompatible Pull request with backwards incompatible changes label Jan 13, 2026

Always use escapeForFileName

89c3539

nikitamikhaylov self-assigned this Jan 13, 2026

Algunenano requested a review from Copilot January 14, 2026 12:34

Copilot AI reviewed Jan 14, 2026

View reviewed changes

src/Storages/MergeTree/MutateTask.cpp Outdated Show resolved Hide resolved

src/Storages/MergeTree/MutateTask.cpp Outdated Show resolved Hide resolved

Algunenano mentioned this pull request Jan 14, 2026

Minmax indices by default #76867

Draft

Add 'escape_index_filenames' to handle compatibility

115d710

Algunenano requested a review from Copilot January 16, 2026 12:14

Copilot AI reviewed Jan 16, 2026

View reviewed changes

src/Core/SettingsChangesHistory.cpp Show resolved Hide resolved

tests/integration/test_index_filename_upgrade/test.py Outdated Show resolved Hide resolved

src/Storages/MergeTree/MergeTreeIndices.cpp Show resolved Hide resolved

Update tests/integration/test_index_filename_upgrade/test.py

0164264

Co-authored-by: Copilot <[email protected]>

Algunenano marked this pull request as ready for review January 16, 2026 12:29

Flaky

ad356f2

Algunenano requested a review from nikitamikhaylov January 21, 2026 13:23

nikitamikhaylov reviewed Jan 21, 2026

View reviewed changes

nikitamikhaylov approved these changes Jan 21, 2026

View reviewed changes

Algunenano and others added 4 commits January 22, 2026 15:22

Adjust test for multiple retries

f495e3b

Merge remote-tracking branch 'blessed/master' into idx_utf8

0630841

Filter during analysis so it's easier to verify

bf1e425

Update test.py

8b3c8fb

Algunenano added this pull request to the merge queue Jan 23, 2026

Merged via the queue into ClickHouse:master with commit a763c03 Jan 23, 2026
127 of 132 checks passed

Algunenano deleted the idx_utf8 branch January 23, 2026 14:39

robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape index filenames to prevent broken parts#94079

Escape index filenames to prevent broken parts#94079
Algunenano merged 9 commits intoClickHouse:masterfrom
Algunenano:idx_utf8

Algunenano commented Jan 13, 2026 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Jan 13, 2026 •

edited by Algunenano

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algunenano commented Jan 16, 2026

Uh oh!

nikitamikhaylov Jan 21, 2026

Uh oh!

Algunenano Jan 21, 2026

Uh oh!

Algunenano Jan 22, 2026

Uh oh!

nikitamikhaylov Jan 21, 2026

Uh oh!

Algunenano Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Algunenano commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Jan 13, 2026 • edited by Algunenano Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algunenano commented Jan 16, 2026

Uh oh!

nikitamikhaylov Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Algunenano Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Algunenano Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

nikitamikhaylov Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Algunenano Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Algunenano commented Jan 13, 2026 •

edited

Loading

clickhouse-gh bot commented Jan 13, 2026 •

edited by Algunenano

Loading