Local cache for remote filesystem#33717
Conversation
|
If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this? |
7eaf6a5 to
41f4f33
Compare
Thanks, you are right, will make a solution to avoid this. |
41f4f33 to
31d2749
Compare
31d2749 to
a566099
Compare
|
I compiled using this branch, but I made an error at startup. |
|
It is not ready yet. Moreover I did not push the latest version in this PR. |
b5e12e1 to
36a41ac
Compare
fa0780a to
710bba8
Compare
|
@kant777 LRU is obviously not the best caching algorithm, but it is pointless to implement anything else until you have reproducible comparison mechanism to evaluate different algorithms on realistic workload. |
alesapin
left a comment
There was a problem hiding this comment.
Still reading, but currently I see no big issues. Discussed a lot in TG chat. Let's continue.
|
|
||
| bool LRUFileCache::tryReserve(const Key & key_, size_t offset_, size_t size, [[maybe_unused]] std::lock_guard<std::mutex> & cache_lock) | ||
| { | ||
| auto removed_size = 0; |
There was a problem hiding this comment.
Maybe check corner case when we are trying to reserve more data than the whole cache size?
1589782 to
8a7b4fa
Compare
8a7b4fa to
aef99de
Compare
569dfe8 to
20eb192
Compare
c717854 to
aab3de7
Compare
e7f7298 to
bd68d1d
Compare
c70f7f5 to
c86955d
Compare
Not related to changes:
Those which failed - failed because of a timeout in stateless tests flakey check run. |
|
VFS cache is intended to be composable. So, you define caching layer on top of another VFS. So you can have client-side encrypted data on s3 and choose between caching encrypted and unencrypted data. Now I see that s3 cache configuration is duplicating s3 configuration 😿 I will have to say that configuration is preliminary and the feature is in preview. |
| if (allow_non_strict_checking) | ||
| return "None"; | ||
|
|
||
| throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cannot use cache without query id"); |
There was a problem hiding this comment.
Shouldn't be this a LOGICAL_ERROR?
There was a problem hiding this comment.
Yeah, thanks, I will change in those places.
Initially I started making it REMOTE_FS_OBJECT_CACHE_ERROR instead of LOGICAL_ERROR, because I wanted to check certain errors for EXPECT_THROW in unit tests, but then it would not work in debug build unit tests, so I started to not use logical error...
| std::lock_guard segment_lock(mutex); | ||
|
|
||
| if (downloader_id.empty()) | ||
| throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader"); |
| throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader"); | ||
|
|
||
| if (getCallerId() != downloader_id) | ||
| throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Downloader can be reset only by downloader"); |
There was a problem hiding this comment.
Also here, and in few other places in this file for similar exceptions.
| namespace ErrorCodes | ||
| { | ||
| extern const int BAD_ARGUMENTS; | ||
| {extern const int BAD_ARGUMENTS; |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add local cache for disk s3. Closes #28961.
Detailed description / Documentation draft:
.