Processors and storages by KochetovNicolai · Pull Request #7181 · ClickHouse/ClickHouse

KochetovNicolai · 2019-10-03T10:46:39Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

Improvement

Short description (up to few sentences):
Support for processors in MergeTree.

Fix MergeTreeReader. Fix MergeTreeBaseSelectProcessor. Better exception message for TreeExecutor. Added header_without_virtual_columns to MergeTreeBaseSelectProcessor. Fix MergeTreeReverseSelectProcessor. Fix MergeTreeDataSelectExecutor.

alesapin

I don't see any switch were we can disable new behavior. Is it expected?

For my point of view, this transition code looks very complex :( Maybe previously we need to rewrite some basic classes like ExpressionActions? Also it's not clear, why we pack processors into two dimensional array (Pipes) and then flatten them back sometimes. Constuctions like pipe.back()->getInputs().front() are not convinient.

I need some clarification on my questions. After that I can make review again.

alesapin · 2019-10-14T15:12:33Z

dbms/src/DataStreams/ExecutionSpeedLimits.cpp

+void ExecutionSpeedLimits::throttle(size_t read_rows, size_t read_bytes, size_t total_rows, UInt64 total_elapsed_microseconds)
+{
+    if ((min_execution_speed || max_execution_speed || min_execution_speed_bytes ||
+         max_execution_speed_bytes || (total_rows && timeout_before_checking_execution_speed != 0)) &&


better to compare all integer variables with zero for consistency.

alesapin · 2019-10-18T19:21:28Z

dbms/src/DataStreams/ExecutionSpeedLimits.cpp

+
+        if (elapsed_seconds > 0)
+        {
+            if (min_execution_speed && read_rows / elapsed_seconds < min_execution_speed)


I'd preferer to store read_rows / elapsed_seconds and read_bytes / elapsed in separate variables, like rows_per_second.

alesapin · 2019-10-18T19:23:10Z

dbms/src/DataStreams/ExecutionSpeedLimits.cpp

+    }
+}
+
+void ExecutionSpeedLimits::throttle(size_t read_rows, size_t read_bytes, size_t total_rows, UInt64 total_elapsed_microseconds)


total_rows -- not clear, either total rows already read or total rows to read in future?

alesapin · 2019-10-18T19:23:37Z

dbms/src/DataStreams/ExecutionSpeedLimits.h

+class ExecutionSpeedLimits
+{
+public:
+    size_t min_execution_speed = 0;


alesapin · 2019-10-18T19:24:09Z

dbms/src/DataStreams/ExecutionSpeedLimits.h

+{
+
+/// Limits for query execution speed.
+/// In rows per second.


alesapin · 2019-10-18T20:13:05Z

dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

    for (size_t i = 0; i < sort_columns_size; ++i)
        sort_description.emplace_back(header.getPositionByName(sort_columns[i]), 1, 1);

+    auto streams_to_merge = [&]()


looks like we need only to_merge and pipes here.

alesapin · 2019-10-18T20:17:06Z

dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

+    auto it = to_merge.begin();
+    for (auto & input : merged_processor->getInputs())
+    {
+        connect(**it, input);


It seems like the building of n to n connections is a common operation. Maybe create a separate function?

alesapin · 2019-10-18T20:18:19Z

dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

+    }
+
+    Processors result;
+    result.reserve(2 * pipes.size() + 1);


Why do we need to put processors and pipes and then flatten them?

alesapin · 2019-10-18T20:21:29Z

dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp

    extern const int MEMORY_LIMIT_EXCEEDED;
 }

+static Block replaceTypes(Block && header, const MergeTreeData::DataPartPtr & data_part)


Seen this before.

alesapin · 2019-10-18T20:26:18Z

dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp

+            block.insert({result.columns[pos], name_and_type->type, name_and_type->name});
+
+        if (alias_actions)
+            alias_actions->execute(block);


In the future we will execute all actions on header and columns?

Yes, I suppose so.

KochetovNicolai · 2019-10-19T03:24:20Z

I don't see any switch were we can disable new behavior. Is it expected?

Yes. I just don't like to have two versions of code now.
There is still a switch for processors (experimental-use_processors flag), but the part with storages will work over wrapper.

alesapin

Pipe, TreeExecutorBlockInputStream and Transforms interfaces look good to me. Changes in SelectExecutor and high-level streams (Select and Sequential) also seem reasonable.

Diff in low-level readers (MergeTreeReader and MergeTreeRangeReader) seems too complicated. We just replacing Block interface with Columns, but had to rewrite about 70% of code in these classes :(

Also, we need to add comments about the temporary code.

alesapin · 2019-10-31T09:30:40Z

dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp

+        bool has_columns = false;
+        for (auto & column : columns)
+            if (column)
+                has_columns = true;


alesapin · 2019-10-31T09:37:52Z

dbms/src/Processors/Executors/TreeExecutorBlockInputStream.cpp

+    {
+        IProcessor * node = stack.top();
+
+        auto status = prepare_processor(node);


May right this code here without lambda? Seems like it used in one place.

I think with lambda it's more readable.

alesapin · 2019-10-31T10:00:46Z

dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp

+        return columns;
    }

+    columns.resize(merge_tree_reader->getColumns().size());


Slightly confusing. Previously we haven't got Columns, and getColumns() method had looked definitely.

alesapin · 2019-10-31T10:02:12Z

dbms/src/Storages/MergeTree/StorageFromMergeTreeDataPart.h

+        auto pipes = MergeTreeDataSelectExecutor(part->storage).readFromParts(
+                {part}, column_names, query_info, context, max_block_size, num_streams);
+
+        BlockInputStreams streams;


Comments about temporary code?

alesapin · 2019-10-31T10:04:38Z

dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

    for (size_t i = 0; i < sort_columns_size; ++i)
        sort_description.emplace_back(header.getPositionByName(sort_columns[i]), 1, 1);

+    auto streams_to_merge = [&pipes]()


Temporary code. Need comment.

alesapin

.

KochetovNicolai added 6 commits September 13, 2019 11:59

Enable Processors by default.

4576e1f

Added TreeExecutor.

1335aa7

Added IStorage::readWithProcessors.

1f5e62d

Add processors to StorageMergeTree [WIP].

3c53dfd

Remove Block from RangeReader.

5108ebe

Update MergeTreeRangeReader.

b65fe57

KochetovNicolai added the pr-improvement Pull request with some product improvements label Oct 3, 2019

KochetovNicolai added 6 commits October 4, 2019 20:49

Update MergeTreeDataSelectExecutor.

1689576

Update TreeExecutor.

54d32da

Fix MergeTreeRangeReader.

e48f7fa

Fix MergeTreeReader. Fix MergeTreeBaseSelectProcessor. Better exception message for TreeExecutor. Added header_without_virtual_columns to MergeTreeBaseSelectProcessor. Fix MergeTreeReverseSelectProcessor. Fix MergeTreeDataSelectExecutor.

Added ExecutionSpeedLimits.

627d48c

Progress for MergeTreeSelectProcessor.

23069ca

Update QueryPipeline.

d4f11af

KochetovNicolai force-pushed the processors-and-storages branch from 4e53d59 to d4f11af Compare October 4, 2019 17:50

KochetovNicolai added 9 commits October 4, 2019 20:53

Merged with master.

95ec0f7

Fix progress callback for processors pipeline.

c7bb832

Try fix progressbar.

ea27918

Try fix progressbar.

eb2677c

Merge branch 'master' into processors-and-storages

9c5ae5f

Disable processors by default.

dea89cf

Fix MergeTreeSequentialBlockInputStream.

4728bdf

Fix MergeTreeSequentialBlockInputStream.

3780527

Added more comments.

ef14df4

KochetovNicolai marked this pull request as ready for review October 10, 2019 14:42

KochetovNicolai added 5 commits October 11, 2019 00:46

Merge branch 'master' into processors-and-storages

e48755d

Enable processors by default.

89dfe78

Merge branch 'master' into processors-3

7574883

Fix build.

7c25755

Fix build.

5364f76

alesapin reviewed Oct 18, 2019

View reviewed changes

KochetovNicolai added 3 commits October 21, 2019 18:16

Added Pipe class. Updated MergeTreeDataSelectExecutor.

f7d2e1b

Disable processors by default.

4ca83a8

Merged with master.

2893c35

github-actions bot added the comp-message-queues Message queue integrations (Kafka, RabbitMQ, NATS table engines for stream ingestion/egress). label Oct 21, 2019

KochetovNicolai added 6 commits October 21, 2019 19:26

Review fixes.

2b334a4

Fix build.

e7ba48e

Fix build.

dad1e39

Try to fix AggregateFunctionGroupBitmap.

640da3f

Disable processors by default.

bcc4c2f

Added more comments.

9abab40

alesapin reviewed Oct 31, 2019

View reviewed changes

KochetovNicolai added 4 commits October 31, 2019 14:32

Review fixes.

a38124c

Merged with master

a80338e

Fix build.

1837841

Fix clang build.

be1ccaa

alesapin approved these changes Nov 1, 2019

View reviewed changes

KochetovNicolai merged commit 5bb47e2 into master Nov 1, 2019

KochetovNicolai mentioned this pull request Nov 1, 2019

Processors 3 #6933

Merged

alexey-milovidov mentioned this pull request May 29, 2020

Fix bug with Throttler and query speed estimation #11296

Merged

Conversation

KochetovNicolai commented Oct 3, 2019

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KochetovNicolai commented Oct 19, 2019

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants