Skip to content

Make cache composable, allow not to evict certain files (regarding .idx, .mrk, ..), delete old cache version#36171

Merged
kssenii merged 121 commits intoClickHouse:masterfrom
kssenii:make-cache-composable
Aug 18, 2022
Merged

Make cache composable, allow not to evict certain files (regarding .idx, .mrk, ..), delete old cache version#36171
kssenii merged 121 commits intoClickHouse:masterfrom
kssenii:make-cache-composable

Conversation

@kssenii
Copy link
Copy Markdown
Member

@kssenii kssenii commented Apr 12, 2022

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Make cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes #36140. Closes #37889.

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backward-incompatible Pull request with backwards incompatible changes label Apr 12, 2022
@kssenii kssenii force-pushed the make-cache-composable branch from e41e30c to 37242a0 Compare April 12, 2022 14:55
@azat azat mentioned this pull request Apr 20, 2022
@kssenii kssenii force-pushed the make-cache-composable branch from 6bbed28 to 0c43b7b Compare April 26, 2022 14:11
This was referenced Apr 28, 2022
@kssenii kssenii force-pushed the make-cache-composable branch from 58a709c to e501a5a Compare May 19, 2022 13:55
@kssenii kssenii force-pushed the make-cache-composable branch 2 times, most recently from 9fa9d02 to 3a087f5 Compare May 23, 2022 20:06
@kssenii kssenii force-pushed the make-cache-composable branch from 688eab1 to 0ff5609 Compare August 10, 2022 14:55
@kssenii kssenii force-pushed the make-cache-composable branch from 2aabe80 to 8faaa6d Compare August 11, 2022 09:00
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge amount of work. It's hard to do really detailed review. But in general, it's a big improvement in flexibility.

return offset;
if (file_segments_holder && current_file_segment_it != file_segments_holder->file_segments.end())
{
// auto & file_segments = file_segments_holder->file_segments;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to leave this commented code, because it should be a good improvement to reduce number of seeks, but it is very complex and need to be done very carefully and I will not do it right now, but having it there will remind that it needs to be done.
Or I can remove this commented code and add a TODO if it is better.


if (allow_seeks)
{
if (whence != SEEK_SET && whence != SEEK_CUR)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check out of if.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when !allow_seeks only SEEK_SET is allowed. So current way seems more correct.

{
return Range{
.left = static_cast<size_t>(offset),
.right = read_until_position ? std::optional{read_until_position - 1} : std::nullopt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why - 1? Included?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A little inconsistent, but it happened. Because used as included in http read buffer.

};

if (objects.size() != 1)
throw Exception(ErrorCodes::CANNOT_USE_CACHE, "Unable to read multiple objects, support not added");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw Exception(ErrorCodes::CANNOT_USE_CACHE, "Unable to read multiple objects, support not added");
throw Exception(ErrorCodes::CANNOT_USE_CACHE, "Read of multiple objects is unsupported");

But does it mean that *Log tables (which do appends) will not work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think I can remove this exception and multiple objects will work fine, just cache key will be deduces from the first path only. Anyway this logic is not applied for s3 and for it we have buffer->isIntegratedWithFIlesystemCache() == 1 and another implementation is used. For non-s3 I should add a test for *Log.

@kssenii kssenii force-pushed the make-cache-composable branch from 3b4a042 to 52832ea Compare August 16, 2022 09:52
@kssenii kssenii force-pushed the make-cache-composable branch from f10b9a0 to 93816e7 Compare August 16, 2022 13:12
@tavplubix
Copy link
Copy Markdown
Member

Changelog entry should explain what exactly is incompatible and how to do upgrade

@kssenii kssenii force-pushed the make-cache-composable branch from 54b68f7 to d193eee Compare August 16, 2022 22:16
@kssenii kssenii force-pushed the make-cache-composable branch from bab6dd1 to f9b6091 Compare August 17, 2022 13:34
@kssenii
Copy link
Copy Markdown
Member Author

kssenii commented Aug 18, 2022

Stress test (debug) — Killed by signal (in clickhouse-server.log)

Logical error: 'Cannot find column in(status, _subquery8) in ActionsDAG result' -- not related to changes #40336

Stress test (memory) — Killed by signal (in clickhouse-server.log)

the same <Fatal> : Logical error: 'Cannot find column _subquery499 in ActionsDAG result'.

@kssenii kssenii merged commit d8fd1f9 into ClickHouse:master Aug 18, 2022
@tavplubix
Copy link
Copy Markdown
Member

@kssenii, Stateless tests flaky check (address) failure was related, you merged PR with flaky test

/// This is a workaround of a read pass EOF bug in linux kernel with pread()
if (file_size.has_value() && file_offset_of_buffer_end >= *file_size)
return false;
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a check for indentation to Style Check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backward-incompatible Pull request with backwards incompatible changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use-of-uninitialized-value in DB::TCPHandler Logical error: Cache writer not initialized

5 participants