Control cache downloads to avoid negative optimization of local caches#37516
Merged
kssenii merged 13 commits intoClickHouse:masterfrom May 27, 2022
Merged
Control cache downloads to avoid negative optimization of local caches#37516kssenii merged 13 commits intoClickHouse:masterfrom
kssenii merged 13 commits intoClickHouse:masterfrom
Conversation
kssenii
reviewed
May 25, 2022
kssenii
reviewed
May 25, 2022
kssenii
approved these changes
May 26, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Currently clickhouse directly downloads all remote files to the local cache (even if they are only read once), which will frequently cause IO of the local hard disk. In some scenarios, these IOs may not be necessary and may easily cause negative optimization. As shown in the figure below, when we run SSB Q1-Q4, the performance of the cache has caused negative optimization.
In response to the above problems, we record the data access trend (an LRU queue) within an access frame at the cache layer. Only frequently accessed cache blocks will be saved locally. As shown in the figure, the addition of control will not cause significant negative optimization, but the data with hot spots can still be cached locally.
The relevant threshold can be set in the configuration file (the threshold indicates how many times a certain piece of data is accessed before being cached, the default value is 0).