Skip to content

Local cache for remote filesystem#33717

Merged
kssenii merged 69 commits intoClickHouse:masterfrom
kssenii:local-cache-for-remote-fs
Mar 11, 2022
Merged

Local cache for remote filesystem#33717
kssenii merged 69 commits intoClickHouse:masterfrom
kssenii:local-cache-for-remote-fs

Conversation

@kssenii
Copy link
Member

@kssenii kssenii commented Jan 17, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add local cache for disk s3. Closes #28961.

Detailed description / Documentation draft:
.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 17, 2022
@yiguolei
Copy link
Contributor

If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this?

@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from 7eaf6a5 to 41f4f33 Compare January 18, 2022 09:45
@kssenii
Copy link
Member Author

kssenii commented Jan 18, 2022

@yiguolei

If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this?

Thanks, you are right, will make a solution to avoid this.

@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from 41f4f33 to 31d2749 Compare January 18, 2022 11:51
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from 31d2749 to a566099 Compare January 18, 2022 21:42
@xiedeyantu
Copy link
Contributor

xiedeyantu commented Jan 21, 2022

I compiled using this branch, but I made an error at startup.
Application: DB::Exception: Cannot read all data. Bytes read: 3. Bytes expected: 4.

@kssenii
Copy link
Member Author

kssenii commented Jan 21, 2022

It is not ready yet. Moreover I did not push the latest version in this PR.
And it is in draft state now, you can test it when it is not a draft.

@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from b5e12e1 to 36a41ac Compare January 22, 2022 17:43
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from fa0780a to 710bba8 Compare January 23, 2022 20:49
@kant777
Copy link

kant777 commented Jan 24, 2022

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jan 24, 2022

@kant777 LRU is obviously not the best caching algorithm, but it is pointless to implement anything else until you have reproducible comparison mechanism to evaluate different algorithms on realistic workload.

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading, but currently I see no big issues. Discussed a lot in TG chat. Let's continue.


bool LRUFileCache::tryReserve(const Key & key_, size_t offset_, size_t size, [[maybe_unused]] std::lock_guard<std::mutex> & cache_lock)
{
auto removed_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check corner case when we are trying to reserve more data than the whole cache size?

@kssenii kssenii force-pushed the local-cache-for-remote-fs branch 2 times, most recently from 1589782 to 8a7b4fa Compare January 24, 2022 22:34
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from 8a7b4fa to aef99de Compare January 24, 2022 22:46
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from 569dfe8 to 20eb192 Compare March 9, 2022 12:38
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from c717854 to aab3de7 Compare March 9, 2022 15:13
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from e7f7298 to bd68d1d Compare March 10, 2022 10:02
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch from c70f7f5 to c86955d Compare March 10, 2022 19:22
@kssenii
Copy link
Member Author

kssenii commented Mar 11, 2022

Stress test (thread, actions) — Sanitizer assert (in stderr.log)

Not related to changes:

WARNING: ThreadSanitizer: heap-use-after-free (pid=607)
  Atomic write of size 8 at 0x7b48006c01f0 by thread T243:
    #0 __tsan_atomic64_fetch_add <null> (clickhouse+0xad0a815)
    #1 long std::__1::__cxx_atomic_fetch_add<long>(std::__1::__cxx_atomic_base_impl<long>*, long, std::__1::memory_order) obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:1050:12 (clickhouse+0xad9d56a)
    #2 std::__1::__atomic_base<long, true>::fetch_add(long, std::__1::memory_order) obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:1719:17 (clickhouse+0xad9d56a)
    #3 MemoryTracker::allocImpl(long, bool, MemoryTracker*) obj-x86_64-linux-gnu/../src/Common/MemoryTracker.cpp:118:35 (clickhouse+0xad9d56a)
    #4 MemoryTracker::allocImpl(long, bool, MemoryTracker*) obj-x86_64-linux-gnu/../src/Common/MemoryTracker.cpp:239:22 (clickhouse+0xad9d940)

Stateless tests flaky check (address, actions) — Timeout, fail: 27, passed: 390

Those which failed - failed because of a timeout in stateless tests flakey check run.

@kssenii kssenii merged commit 818459b into ClickHouse:master Mar 11, 2022
@alexey-milovidov
Copy link
Member

VFS cache is intended to be composable. So, you define caching layer on top of another VFS. So you can have client-side encrypted data on s3 and choose between caching encrypted and unencrypted data. Now I see that s3 cache configuration is duplicating s3 configuration 😿

I will have to say that configuration is preliminary and the feature is in preview.

if (allow_non_strict_checking)
return "None";

throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cannot use cache without query id");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be this a LOGICAL_ERROR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks, I will change in those places.
Initially I started making it REMOTE_FS_OBJECT_CACHE_ERROR instead of LOGICAL_ERROR, because I wanted to check certain errors for EXPECT_THROW in unit tests, but then it would not work in debug build unit tests, so I started to not use logical error...

std::lock_guard segment_lock(mutex);

if (downloader_id.empty())
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why not LOGICAL_ERROR?

throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader");

if (getCallerId() != downloader_id)
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Downloader can be reset only by downloader");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, and in few other places in this file for similar exceptions.

namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
{extern const int BAD_ARGUMENTS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local cache for remote filesystem (RFC)

9 participants