Conversation
62fc420 to
4ede75b
Compare
b117de7 to
049b1e4
Compare
3c0b522 to
764f14a
Compare
azat
left a comment
There was a problem hiding this comment.
Am I understand this correctly, that this PR makes writes to MergeTree in multiple threads, so if I will INSERT two separate blocks (i.e. two INSERT queries) then each will be processed in 16 threads.
I guess (although don't know) it is fine for s3, but not for local filesystems.
So looks to me that:
- default should be 1 (or some auto detection)
- thread pool should be static (maybe greater in size but static), since right now you can pretty quickly use all the 10K threads
764f14a to
38beeb7
Compare
6f0707b to
c5ef8cb
Compare
7bc20f3 to
9bd5965
Compare
9bd5965 to
250d550
Compare
|
@azat I switched default number of threads back to 1 but it is not very clear for me why in some cases we can use separate threads, for example here: . Can you also clarify what shall be the scope of thread pool? Global one or table or query-specific? |
This is for SELECT query, and:
|
There was a problem hiding this comment.
Using thread pool to write columns in parallel to regular filesystem looks overkill if the block is small enough (I guess it is OK if MergeTree uses S3 as the underlying storage)
There was a problem hiding this comment.
Also this threads will not be accounted as threads for query, to make them accounted you need to wrap write_column_job with:
auto thread_group = CurrentThread::getGroup();
...
writing_thread_pool->scheduleOrThrowOnError([&write_column_job, thread_group]() {
setThreadName("QueryPipelineEx");
if (thread_group)
CurrentThread::attachTo(thread_group);
SCOPE_EXIT(
if (thread_group)
CurrentThread::detachQueryIfNotDetached();
);
write_column_job();
}P.S. the code is completely untested
250d550 to
16f0afb
Compare
This reverts commit 1c8c899999566c7da20e322b7c95bee1650f49d9.
… fixed writer threads grouping.
16f0afb to
c6cc5f9
Compare
Akazz
left a comment
There was a problem hiding this comment.
Your PR currently urges for at least some tests for your setting merge_tree_writer_max_threads_per_block. Potentially such tests could also explain this PR's purpose/highlight the problem that is solves.
In my understanding with your changes it will be difficult to reason about how many threads query processing would take. I also do not believe that trying to parallelize data writes at this level can be a good idea in general.
| if (settings.max_threads_per_block != 1) | ||
| { | ||
| offset_columns_per_column.reserve(columns_list.size()); | ||
| writing_thread_pool = std::make_unique<ThreadPool>(settings.max_threads_per_block); |
There was a problem hiding this comment.
Turns out we create a separate thread pool on writing each block to the DataPart. Meanwhile the pool's size is defined by settings.max_threads_per_block, which probably persists in between such writes. Thus we could have such object created somewhere in the enclosing code.
There was a problem hiding this comment.
Also, note that this setting potentially ignores any limiting on the number of execution threads prescribed by max_threads.
| writing_thread_pool->wait(); | ||
|
|
||
| // data_written = std::any_of(column_data_written.begin(), column_data_written.end(), std::identity()); | ||
| data_written = std::find(column_data_written.begin(), column_data_written.end(), true) != column_data_written.end(); |
There was a problem hiding this comment.
What is now the meaning of this data_written field? What happens if any of the data write tasks fail and some of the columns fail to be written?
| } | ||
| else | ||
|
|
||
| column_data_written[i] = written; |
There was a problem hiding this comment.
Can this column_data_written[i] simply be ref-captured by the alias written? (in the similar manner as offset_columns)
| } | ||
|
|
||
| it = columns_list.begin(); | ||
| for (size_t i = 0; i < columns_list.size(); ++i, ++it) |
There was a problem hiding this comment.
Parallelizing data write by columns might undermine the idea altogether in case of single fat column - parallelization would become completely pointless in such case.
|
After internal discussion we decided to decline this implementation due to a number of reasons (most of them are mentioned in the discussions above). The task of parallelizing IO-writes does not seem to belong to this entity (MergeTreeDataPartWriterWide) |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one)
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Attempt to make wide parts faster in S3.