Parallel wide parts writer by excitoon · Pull Request #14150 · ClickHouse/ClickHouse

excitoon · 2020-08-27T00:28:05Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one)

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Attempt to make wide parts faster in S3.

azat

Am I understand this correctly, that this PR makes writes to MergeTree in multiple threads, so if I will INSERT two separate blocks (i.e. two INSERT queries) then each will be processed in 16 threads.

I guess (although don't know) it is fine for s3, but not for local filesystems.

So looks to me that:

default should be 1 (or some auto detection)
thread pool should be static (maybe greater in size but static), since right now you can pretty quickly use all the 10K threads

excitoon · 2020-11-30T04:22:46Z

@azat I switched default number of threads back to 1 but it is not very clear for me why in some cases we can use separate threads, for example here:

ClickHouse/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

Line 685 in b94cc5c

    
           pool.scheduleOrThrowOnError([&, part_index, thread_group = CurrentThread::getGroup()] {

. Can you also clarify what shall be the scope of thread pool? Global one or table or query-specific?

azat · 2020-12-02T18:49:11Z

but it is not very clear for me why in some cases we can use separate threads, for example here:

This is for SELECT query, and:

it respect max_threads setting (so if I will set max_threads=1, it will not do parallel loading, and also max_threads will be lowered to 1 in case of simple SELECT with LIMIT)
it is not for each block, while you version does this for each INSERT block and for multiple INSERT into one table you will got number_of_inserts*merge_tree_writer_max_threads, plus there is also max_insert_threads, so 16 as default value does not looks sane to me

azat

BTW it looks like it can be covered (at least for simple MergeTree, w/o S3), since those threads should be accounted in the query_log (after suggested change)

azat · 2020-12-02T19:01:07Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

Using thread pool to write columns in parallel to regular filesystem looks overkill if the block is small enough (I guess it is OK if MergeTree uses S3 as the underlying storage)

azat · 2020-12-02T19:07:13Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

Also this threads will not be accounted as threads for query, to make them accounted you need to wrap write_column_job with:

auto thread_group = CurrentThread::getGroup(); ... writing_thread_pool->scheduleOrThrowOnError([&write_column_job, thread_group]() { setThreadName("QueryPipelineEx"); if (thread_group) CurrentThread::attachTo(thread_group); SCOPE_EXIT( if (thread_group) CurrentThread::detachQueryIfNotDetached(); ); write_column_job(); }

P.S. the code is completely untested

This reverts commit 1c8c899999566c7da20e322b7c95bee1650f49d9.

… fixed writer threads grouping.

Akazz

Your PR currently urges for at least some tests for your setting merge_tree_writer_max_threads_per_block. Potentially such tests could also explain this PR's purpose/highlight the problem that is solves.

In my understanding with your changes it will be difficult to reason about how many threads query processing would take. I also do not believe that trying to parallelize data writes at this level can be a good idea in general.

Akazz · 2021-01-11T16:47:40Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

+    if (settings.max_threads_per_block != 1)
+    {
+        offset_columns_per_column.reserve(columns_list.size());
+        writing_thread_pool = std::make_unique<ThreadPool>(settings.max_threads_per_block);


Turns out we create a separate thread pool on writing each block to the DataPart. Meanwhile the pool's size is defined by settings.max_threads_per_block, which probably persists in between such writes. Thus we could have such object created somewhere in the enclosing code.

Also, note that this setting potentially ignores any limiting on the number of execution threads prescribed by max_threads.

Akazz · 2021-01-11T16:49:36Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

+        writing_thread_pool->wait();
+
+    // data_written = std::any_of(column_data_written.begin(), column_data_written.end(), std::identity());
+    data_written = std::find(column_data_written.begin(), column_data_written.end(), true) != column_data_written.end();


What is now the meaning of this data_written field? What happens if any of the data write tasks fail and some of the columns fail to be written?

Akazz · 2021-01-11T16:50:39Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

-        }
-        else
+
+            column_data_written[i] = written;


Can this column_data_written[i] simply be ref-captured by the alias written? (in the similar manner as offset_columns)

Akazz · 2021-01-11T19:27:34Z

src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp

+    }
+
+    it = columns_list.begin();
+    for (size_t i = 0; i < columns_list.size(); ++i, ++it)


Parallelizing data write by columns might undermine the idea altogether in case of single fat column - parallelization would become completely pointless in such case.

Akazz · 2021-01-26T14:43:23Z

After internal discussion we decided to decline this implementation due to a number of reasons (most of them are mentioned in the discussions above). The task of parallelizing IO-writes does not seem to belong to this entity (MergeTreeDataPartWriterWide)

robot-clickhouse added the pr-performance Pull request with some performance improvements label Aug 27, 2020

excitoon force-pushed the parallelwidewrite branch 2 times, most recently from 62fc420 to 4ede75b Compare September 5, 2020 17:34

filimonov added the altinity label Sep 9, 2020

excitoon force-pushed the parallelwidewrite branch from b117de7 to 049b1e4 Compare September 11, 2020 14:35

excitoon changed the title ~~Attempt to make wide parts faster in S3~~ Parallel wide parts writer Sep 11, 2020

excitoon marked this pull request as ready for review September 11, 2020 14:36

excitoon force-pushed the parallelwidewrite branch 6 times, most recently from 3c0b522 to 764f14a Compare September 11, 2020 22:12

azat reviewed Sep 11, 2020

View reviewed changes

excitoon force-pushed the parallelwidewrite branch from 764f14a to 38beeb7 Compare September 14, 2020 00:01

Akazz self-assigned this Sep 15, 2020

excitoon force-pushed the parallelwidewrite branch 3 times, most recently from 6f0707b to c5ef8cb Compare September 30, 2020 14:03

excitoon force-pushed the parallelwidewrite branch 2 times, most recently from 7bc20f3 to 9bd5965 Compare October 6, 2020 07:24

excitoon marked this pull request as draft October 29, 2020 05:38

excitoon force-pushed the parallelwidewrite branch from 9bd5965 to 250d550 Compare November 30, 2020 04:16

excitoon marked this pull request as ready for review November 30, 2020 04:23

azat reviewed Dec 2, 2020

View reviewed changes

azat mentioned this pull request Dec 4, 2020

Parallel S3 multipart writes #16503

Closed

excitoon force-pushed the parallelwidewrite branch from 250d550 to 16f0afb Compare December 14, 2020 08:24

Parallel wide parts writer.

871462d

excitoon added 8 commits December 28, 2020 04:50

Stupid typo was fixed.

ae623b9

Warning of thread sanitizer was fixed.

7a3ff59

Experimental fix.

2b906e6

Rebase fix.

6cc9e5c

Testing with 16 threads.

2ed42a6

Revert "Testing with 16 threads."

1981208

This reverts commit 1c8c899999566c7da20e322b7c95bee1650f49d9.

Proper name for setting merge_tree_writer_max_threads_per_block and…

a493046

… fixed writer threads grouping.

Rebase fix.

c6cc5f9

excitoon force-pushed the parallelwidewrite branch from 16f0afb to c6cc5f9 Compare December 28, 2020 02:45

Akazz suggested changes Jan 11, 2021

View reviewed changes

Akazz added the st-waiting-for-fix label Jan 11, 2021

Akazz closed this Jan 26, 2021

azat mentioned this pull request Jan 11, 2022

support to write merge tree parts in parallel #33500

Closed

Conversation

excitoon commented Aug 27, 2020

Uh oh!

azat left a comment

Choose a reason for hiding this comment

Uh oh!

excitoon commented Nov 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat commented Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

azat Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

azat Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

Akazz left a comment

Choose a reason for hiding this comment

Uh oh!

Akazz Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Akazz Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Akazz Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Akazz Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Akazz Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Akazz commented Jan 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

excitoon commented Nov 30, 2020 •

edited

Loading

azat commented Dec 2, 2020 •

edited

Loading

azat left a comment •

edited

Loading

azat Dec 2, 2020 •

edited

Loading