Grouping sets dev#26869
Conversation
|
@KochetovNicolai Hi, please take a look at my implementation if you are free |
|
I find this implementation has performance issue because in AggregatingStep pipeline is resize to 1 and fork to number of grouping sets, which drastically decrease the performance |
|
@KochetovNicolai Hi, I come up with another implementation in Query Execution phase, using grouping sets transformer. The performance of this implementation is much higher than the one in #24172, please take a look to check if it is ok. |
|
@taylor12805 thank you! Will try to check it soon. |
|
There is bug when query on Distributed Table with grouping sets function, will try to fix it soon |
There was a problem hiding this comment.
Seems like you don't need this variable anymore.
|
@novikd Hi, I generate reference file 01883_with_grouping_sets.reference following doc: https://clickhouse.tech/docs/en/development/adding_test_queries/, I have no idea why the test fails, please give me a hint |
|
|
2. Update formatAST to support grouping sets query with distributed table 3. modify astExpression to support function in grouping sets
b412dab to
3bcf915
Compare
| /// It is needed for debug. See QueryPipelineProcessorsCollector. | ||
| Processors * collected_processors = nullptr; | ||
|
|
||
| Poco::Logger * log = &Poco::Logger::get("Pipe"); |
|
|
||
| Processors aggregating; | ||
|
|
||
| Poco::Logger * log = &Poco::Logger::get("AggregatingStep"); |
| explicit ExtractedSettings(const Settings & settings_); | ||
| }; | ||
|
|
||
| Poco::Logger * poco_log = &Poco::Logger::get("ExpressionAnalyzer"); |
|
|
||
| UInt64 keys_vector_idx = 0; | ||
|
|
||
| Poco::Logger * log = &Poco::Logger::get("GroupingSetsTransform"); |
|
Internal documentation ticket: DOCSUP-20353 |
| if (group_by_with_grouping_sets) | ||
| { | ||
| frame.surround_each_list_element_with_parens = true; | ||
| s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << (s.one_line ? "" : " ") << "GROUPING SETS" << (s.hilite ? hilite_none : ""); | ||
| s.ostr << " ("; | ||
| s.one_line | ||
| ? groupBy()->formatImpl(s, state, frame) | ||
| : groupBy()->as<ASTExpressionList &>().formatImplMultiline(s, state, frame); | ||
| s.ostr << ")"; | ||
| frame.surround_each_list_element_with_parens = false; | ||
| } |
There was a problem hiding this comment.
Formatting looks odd:
SELECT
key1,
key2,
sum(value)
FROM data
GROUP BY
GROUPING SETS (
( key1),
( key2))So looks like:
surround_each_list_element_with_parensshould not be set- extra parenthesis is not required
So something like this:
if (group_by_with_grouping_sets)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << (s.one_line ? "" : " ") << "GROUPING SETS" << (s.hilite ? hilite_none : "");
s.one_line
? groupBy()->formatImpl(s, state, frame)
: groupBy()->as<ASTExpressionList &>().formatImplMultiline(s, state, frame);
}And actually even newline between GROUP BY and GROUPING SETS looks reduntant.
There was a problem hiding this comment.
extra parenthesis is not required
Why? This is part of GROUPING SETS syntax.
But I agree I should remove new line before GROUPING SETS and redundant whitespace before the first element in grouping set.
There was a problem hiding this comment.
Why? This is part of GROUPING SETS syntax.
I don't see why they are required, both queries works fine:
SELECT key1, key2, sum(value) FROM data GROUP BY GROUPING SETS (( key1), ( key2))SELECT key1, key2, sum(value) FROM data GROUP BY GROUPING SETS key1, key2
Plus GROUP BY keys are not surrounded with extra parenthesis.
There was a problem hiding this comment.
Because grouping sets support several grouping commands.
This is a valid syntax:
GROUPING SETS ((key1), (key2), (key1, key2), ())
It should work similar to Postgres.
There was a problem hiding this comment.
Got it, thanks for clarifications.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
add grouping sets function, like GROUP BY grouping sets (a, b, (a, b))
Detailed description / Documentation draft:
Fix multiple grouping sets element issue from PR: #24172