WIP Add Sorting Optimization with General Data Hints#48800
WIP Add Sorting Optimization with General Data Hints#48800TrueAstralpirate wants to merge 73 commits intoClickHouse:masterfrom
Conversation
934d43c to
65b6d19
Compare
f687ced to
76ac6ee
Compare
4c6195e to
39f32a4
Compare
|
This is an automated comment for commit 2cae61d with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
69e705a to
f99b81a
Compare
601f7a9 to
073e982
Compare
| .returns_single_stream = false, | ||
| .preserves_number_of_streams = true, | ||
| .preserves_sorting = false, | ||
| .preserves_data_hints = false, |
There was a problem hiding this comment.
I'm not sure how array join could change min/max values?
| .returns_single_stream = false, | ||
| .preserves_number_of_streams = true, | ||
| .preserves_sorting = true, | ||
| .preserves_data_hints = false, |
| .returns_single_stream = true, | ||
| .preserves_number_of_streams = false, | ||
| .preserves_sorting = false, | ||
| .preserves_data_hints = false, |
There was a problem hiding this comment.
I guess it preserves for columns that are present in output header
| getTraits(actions_dag_, input_stream_.header, input_stream_.sort_description)) | ||
| , actions_dag(actions_dag_) | ||
| { | ||
| /// For now we only preserve hints of untouched columns |
There was a problem hiding this comment.
this comment should be by the function definition/implementation or it will start lying as soon as smth change
| , filter_column_name(std::move(filter_column_name_)) | ||
| , remove_filter_column(remove_filter_column_) | ||
| { | ||
| for (const auto & output_node_ptr : actions_dag->getOutputs()) |
There was a problem hiding this comment.
it seems not enough, because inside the action dag might happen all sorts of things. including aliases and some non-monotonic functions.
so I think we should first check that there is nothing harmful and if so just apply it to lower and upper bound to see what they'll become.
There was a problem hiding this comment.
But it is a filter, not an expression, how can I "apply" it to something? It will return either True or False.
What's going on here: inside updateDataHintsWithFilterActionsDAG I find INPUT columns (not ALIAS), and then try to build restrictions on such columns via <, >, <=, >= predicates with AND.
| output_stream->hints = input_streams[0].hints; | ||
| unionJoinDataHints(output_stream->hints, input_streams[1].hints, join->getTableJoin()); |
There was a problem hiding this comment.
just would be more accurate
| output_stream->hints = input_streams[0].hints; | |
| unionJoinDataHints(output_stream->hints, input_streams[1].hints, join->getTableJoin()); | |
| output_stream->hints = unionJoinDataHints(input_streams[0].hints, input_streams[1].hints, join->getTableJoin()); |
| .returns_single_stream = should_produce_results_in_order_of_bucket_number, | ||
| .preserves_number_of_streams = false, | ||
| .preserves_sorting = false, | ||
| .preserves_data_hints = false, |
There was a problem hiding this comment.
also should preserve for key columns
| prewhere_info, | ||
| enable_vertical_final); | ||
|
|
||
| /// Build data hints for output stream |
| .returns_single_stream = true, | ||
| .preserves_number_of_streams = false, | ||
| .preserves_sorting = false, | ||
| .preserves_data_hints = false, |
There was a problem hiding this comment.
should preserve for key columns
| updateOutputDataHints(); | ||
| } | ||
|
|
||
| void UnionStep::updateOutputDataHints() |
There was a problem hiding this comment.
does it really deserves a separate member function?
|
I think it would be valuable now to support propagation of hints through more steps to get more coverage in tests. |
| auto * rollup_step = typeid_cast<RollupStep *>(node->step.get()); | ||
| if (rollup_step) | ||
| return 0; | ||
|
|
||
| auto * cube_step = typeid_cast<CubeStep *>(node->step.get()); | ||
| if (cube_step) | ||
| return 0; | ||
|
|
||
| auto * totals_having_step = typeid_cast<TotalsHavingStep *>(node->step.get()); | ||
| if (totals_having_step) | ||
| return 0; | ||
|
|
| if (aggregating_node->children.size() != 1) | ||
| return 0; |
There was a problem hiding this comment.
it is not possible afaik, could be converted to chassert(...)
|
|
||
| } | ||
|
|
||
| size_t tryReduceAggregationKeysSize(QueryPlan::Node * node, QueryPlan::Nodes & nodes) |
There was a problem hiding this comment.
pls create a comment describing the transformation. there are tools like https://asciiflow.com/ that allow to create a visual scheme using ascii charset.
in general we like to write comments.
|
|
||
| if (type_name == "UInt8" || type_name == "Int8" || type_name == "UInt16" || type_name == "Int16") | ||
| { | ||
| new_aggregating_keys.push_back(old_key); |
There was a problem hiding this comment.
looking at l.155-156 it seems that new_aggregating_keys and changed_aggregating_keys are supposed to be of the same size. but it will not be true if we got into this branch, right?
| predecessor_node.children = {next_node}; | ||
| predecessor_node.step = std::make_unique<ExpressionStep>(input_stream, predecessor_dag); | ||
|
|
||
| std::unordered_map<std::string, std::string> changed_to_new_key_mapping; |
There was a problem hiding this comment.
| std::unordered_map<std::string, std::string> changed_to_new_key_mapping; | |
| std::unordered_map<std::string, std::string> changed_to_new_key_mapping; | |
| chassert(changed_aggregating_keys.size() == new_aggregating_keys.size()); |
|
|
||
| changed_data_types.push_back(key_column_type); | ||
| ColumnWithTypeAndName const_column; | ||
| if (hint.lower_boundary->getTypeName() == "Int64") |
There was a problem hiding this comment.
what if it is some other signed type?
|
|
||
| ActionsDAG::NodeRawConstPtrs children = {node_to_remove, hint_node}; | ||
| auto plus_function = FunctionFactory::instance().get("plus", Context::getGlobalContextInstance()); | ||
| dag->addOrReplaceInOutputs(dag->addCast(dag->addFunction(plus_function, children, "__uncasted_" + changed_key), changed_data_type, changed_key)); |
There was a problem hiding this comment.
so, in the output header we will have original columns names back, right?
|
|
||
| void AggregatingStep::updateParams(const Names & new_keys, const std::unordered_map<std::string, std::string> & changed_to_new_key_mapping) | ||
| { | ||
| params.keys = new_keys; |
There was a problem hiding this comment.
| sorting_step->updateInputStream(predecessor_node.step->getOutputStream()); | ||
|
|
||
| sorting_node->children = {&predecessor_node}; | ||
|
|
There was a problem hiding this comment.
two points here:
- let's create a post-step that will drop
col - hint.mincolumn in order to preserve the original header - what sort description will the new expression step propagate? probably one with the
col - hint.mincolumn? again better to restore everything back to original sort description
| for (size_t i = 254; i > 0; --i) | ||
| cnt[i] += cnt[i + 1]; | ||
| cnt[0] += cnt[1]; |
There was a problem hiding this comment.
| for (size_t i = 254; i > 0; --i) | |
| cnt[i] += cnt[i + 1]; | |
| cnt[0] += cnt[1]; | |
| for (ssize_t i = 254; i >= 0; --i) | |
| cnt[i] += cnt[i + 1]; |
| memset(cnt, 0, sizeof(cnt)); | ||
|
|
||
| for (auto * it = begin; it != end; ++it) | ||
| cnt[data[*it]]++; |
There was a problem hiding this comment.
maybe we should write here at data[*it] +- 1 to avoid shifting array by one below.
|
Dear @nickitat, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
I'm curious to learn about this PR, but I don't know what it means. What's a short summary of what this PR is doing? How will it improve performance, and for what kinds of queries? |
|
Dear @nickitat, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
todo