Skip to content

Add ability to bypass storage metadata cache#89283

Merged
Michicosun merged 4 commits intomasterfrom
bypass_storage_metadata_cache
Oct 31, 2025
Merged

Add ability to bypass storage metadata cache#89283
Michicosun merged 4 commits intomasterfrom
bypass_storage_metadata_cache

Conversation

@Michicosun
Copy link
Copy Markdown
Member

@Michicosun Michicosun commented Oct 31, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Details

Storage metadata cache can be enabled using enable_shared_storage_snapshot_in_query. It was added to the query context instead of being part of the query analyzer/planner, which can lead to invalid usage of cached metadata where it should not be cached.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Oct 31, 2025

Workflow [PR], commit [eb4b029]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, old analyzer, 4/6) failure
test_scheduler_cpu_preemptive/test.py::test_downscaling[cpu-slot-preemption-timeout-1ms] FAIL cidb, flaky
OOM in dmesg FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 31, 2025
@azat azat self-assigned this Oct 31, 2025
@@ -628,7 +628,7 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree(
* Otherwise `metadata_version` for not first replica will be initialized with 0 by default.
*/
setInMemoryMetadata(metadata_snapshot->withMetadataVersion(metadata_version));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking into this again, I have another idea/question, maybe clearing the cache in setInMemoryMetadata will be less error prone? Not sure that it make sense for this fix, but interesting what you think

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will definitely work, but this will be a strange side effect of updating a storage field.

@Michicosun
Copy link
Copy Markdown
Member Author

CI: #88807

@Michicosun Michicosun added this pull request to the merge queue Oct 31, 2025
Merged via the queue into master with commit 4d51bf8 Oct 31, 2025
122 of 124 checks passed
@Michicosun Michicosun deleted the bypass_storage_metadata_cache branch October 31, 2025 23:44
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 1, 2025
*/
setInMemoryMetadata(metadata_snapshot->withMetadataVersion(metadata_version));
metadata_snapshot = getInMemoryMetadataPtr();
metadata_snapshot = getInMemoryMetadataPtr(/*bypass_metadata_cache=*/true);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a regression test covering this? I'm trying to understand the related issues. Perhaps we could address all of them by simply avoiding the cache for queries that involve DDL. The tricky part might be separating the contexts in CTAS queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants