Skip to content

Remove dangerous code#93652

Merged
alexey-milovidov merged 4 commits intomasterfrom
fix-scary-code
Jan 11, 2026
Merged

Remove dangerous code#93652
alexey-milovidov merged 4 commits intomasterfrom
fix-scary-code

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@rvasin, the code was extremely dangerous, also the comments were written in broken English.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jan 8, 2026

Workflow [PR], commit [13a23c7]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, db disk, old analyzer, 4/6) failure
test_projection_rebuild_with_required_columns/test.py::test_projection_rebuild_uses_only_required_columns FAIL cidb
Integration tests (arm_binary, distributed plan, 4/4) failure
test_async_insert_adaptive_busy_timeout/test.py::test_with_merge_tree_multithread FAIL cidb, issue
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa) FAIL cidb
Integration tests (amd_asan, targeted) error

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 8, 2026
@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Jan 8, 2026

Thank you for pointing it. The comment could be further improved by adding the article:

"starting up after a sudden server shutdown".

Will usage of removeDirectoryIfExists() instead of removeRecursive() finally make the code less scary? In will look at the functions more in details on Monday Jan 12th.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Yes, because recursive removal is absolutely scary. Like you are in shock seeing this code and cannot work for the rest of the day.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Btw, why does the server use Memory databases in your setup?

@alexey-milovidov alexey-milovidov self-assigned this Jan 10, 2026
@alexey-milovidov alexey-milovidov merged commit 56f6828 into master Jan 11, 2026
125 of 132 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-scary-code branch January 11, 2026 15:49
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 11, 2026
@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Jan 11, 2026

BTW I reviewed my original PR for changes https://github.com/ClickHouse/ClickHouse/pull/46071/files#diff-98109370c20111715a20a680345d9bbe14149dfe5657152f3b356d01da474543 it was not me who started to use removeRecursive() in my PR I added the function as the following:

void DatabaseMemory::removeDataPath(ContextPtr local_context)
{
    std::filesystem::remove_all(local_context->getPath() + data_path);
}

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Ok. I'm a little bit confused - as the Memory database does not have any files, why does it have a directory? And if it does have some files, why? And then my changes might be incorrect (however, tests have passed).

@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Jan 11, 2026

Consider example:

CREATE TEMPORARY TABLE table_memory_02561(name String);
INSERT INTO table_memory_02561 (name) VALUES ('test');

CREATE TEMPORARY TABLE table_merge_tree_02525
(
    id UInt64,
    info String
)
ENGINE = MergeTree
ORDER BY id
PRIMARY KEY id;
INSERT INTO table_merge_tree_02525 VALUES (1, 'a'), (2, 'b'), (3, 'c');

When we create the second table table_merge_tree_02525 then the folder _temporary_and_external_tables/ is created inside var/lib/clickhouse/data/
this folder has _tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9/ subfolder for temp table with MergeTree table engine.

If we close clickhouse-client we see the log records like:

2026.01.11 20:25:19.206565 [ 5236 ] {} <Debug> TCP-Session-c5d2ceb5-cfdc-4083-b3db-12b82c52f270: c5d2ceb5-cfdc-4083-b3db-12b82c52f270 Logout, user_id: 94309d50-4f52-5250-31bd-74fecac179db
2026.01.11 20:25:19.206621 [ 5236 ] {} <Trace> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: waiting for locks.
2026.01.11 20:25:19.206648 [ 5236 ] {} <Trace> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: removing data parts (count 1) from filesystem.
2026.01.11 20:25:19.206662 [ 5236 ] {} <Debug> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): Removing 1 parts from filesystem (serially): Parts: [all_1_1_0]
2026.01.11 20:25:19.206937 [ 5236 ] {} <Trace> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: removing all data parts from memory.
2026.01.11 20:25:19.206944 [ 5236 ] {} <Information> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: clearing temporary directories
2026.01.11 20:25:19.207017 [ 5236 ] {} <Information> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: remove format_version.txt, detached, moving and write ahead logs
2026.01.11 20:25:19.207087 [ 5236 ] {} <Information> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: removing table directory recursive to cleanup garbage
2026.01.11 20:25:19.207116 [ 5236 ] {} <Trace> _temporary_and_external_tables.`_tmp_587e431c-9d0f-470b-9ba4-c02218fcf9e9` (587e431c-9d0f-470b-9ba4-c02218fcf9e9): dropAllData: done.

BTW src/Storages/MergeTree/MergeTreeData.cpp:

            LOG_INFO(log, "dropAllData: removing table directory recursive to cleanup garbage");
            disk->removeRecursive(relative_data_path);

Temporary tables live in user sessions. So it's OK.

But if instead of closing clickhouse-client we drop ClickHouse server by Ctrl+C then the folder /var/lib/clickhouse/data/_temporary_and_external_tables/ will stay on file system with all temp tables. On server's startup this garbage should be removed.

So from one side it's clear that temporary tables live in database which is DatabaseMemory and it should not have any data on disk. But it's quite tricky/not so simple with temporary tables (as even they were implemented before my PR). BTW we could also explore how external tables are saved in /var/lib/clickhouse/data/_temporary_and_external_tables/ .

@vdimir
Copy link
Copy Markdown
Member

vdimir commented Mar 16, 2026

Reverted, see comment #94003 (comment)

@rvasin may you create a PR with comment to db_disk->removeDirectoryIfExists in DatabaseMemory::removeDataPath ?

@vdimir vdimir added the post-approved Approved, but after the PR is merged. label Mar 16, 2026
@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Mar 16, 2026

Reverted, see comment #94003 (comment)

@rvasin may you create a PR with comment to db_disk->removeDirectoryIfExists in DatabaseMemory::removeDataPath ?

@vdimir Do you mean it's needed to create PR with commenting the code?

void DatabaseMemory::removeDataPath(ContextPtr)
{
    // auto db_disk = getDisk();
    // db_disk->removeDirectoryIfExists(data_path);
}

@vdimir
Copy link
Copy Markdown
Member

vdimir commented Mar 16, 2026

@rvasin Sorry for not being clear, I meant adding a comment explaining why this code is needed, but only if you have the context, of course.

@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Mar 16, 2026

@vdimir I will create a PR with comment about this code in few minutes.

@rvasin
Copy link
Copy Markdown
Contributor

rvasin commented Mar 16, 2026

The problem of removeDirectoryIfExists in this PR is that removes only an empty directory. But in in case of memory databases startup data path may contain garbage so removeRecursive should be used. Ideally it should be some non-recursive variant.

@vdimir vdimir added post-rejected Should've been rejected, but got merged. and removed post-approved Approved, but after the PR is merged. labels Mar 16, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Why can the directory for memory databases contain garbage?

@azat
Copy link
Copy Markdown
Member

azat commented Mar 25, 2026

If the server was not shutted down properly then there can be leftovers from the CREATE TEMPORARY TABLE, i.e.

$ ll .server/data/_temporary_and_external_tables/_tmp_0247ac83%2D8ffc%2D4196%2Dad88%2D7c8eda9aec76/
total 8
drwxr-x--- 2 azat azat 4096 Mar 25 14:47 detached
-rw-r----- 1 azat azat    1 Mar 25 14:47 format_version.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-rejected Should've been rejected, but got merged. pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants