Skip to content

Grouping sets dev#26869

Merged
alexey-milovidov merged 35 commits intoClickHouse:masterfrom
taylor12805:grouping-sets-dev
Dec 17, 2021
Merged

Grouping sets dev#26869
alexey-milovidov merged 35 commits intoClickHouse:masterfrom
taylor12805:grouping-sets-dev

Conversation

@taylor12805
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
add grouping sets function, like GROUP BY grouping sets (a, b, (a, b))

Detailed description / Documentation draft:
Fix multiple grouping sets element issue from PR: #24172

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jul 28, 2021
@taylor12805
Copy link
Copy Markdown
Contributor Author

@KochetovNicolai Hi, please take a look at my implementation if you are free

@taylor12805
Copy link
Copy Markdown
Contributor Author

I find this implementation has performance issue because in AggregatingStep pipeline is resize to 1 and fork to number of grouping sets, which drastically decrease the performance

@taylor12805
Copy link
Copy Markdown
Contributor Author

@KochetovNicolai Hi, I come up with another implementation in Query Execution phase, using grouping sets transformer. The performance of this implementation is much higher than the one in #24172, please take a look to check if it is ok.

@KochetovNicolai
Copy link
Copy Markdown
Member

@taylor12805 thank you! Will try to check it soon.

@novikd novikd self-assigned this Aug 29, 2021
@taylor12805 taylor12805 requested a review from novikd September 5, 2021 06:38
@taylor12805
Copy link
Copy Markdown
Contributor Author

There is bug when query on Distributed Table with grouping sets function, will try to fix it soon

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you don't need this variable anymore.

@taylor12805
Copy link
Copy Markdown
Contributor Author

@novikd Hi, I generate reference file 01883_with_grouping_sets.reference following doc: https://clickhouse.tech/docs/en/development/adding_test_queries/, I have no idea why the test fails, please give me a hint

@taylor12805 taylor12805 requested a review from novikd September 13, 2021 08:39
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Sep 28, 2021

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ novikd
❌ MaxTheHuman
You have signed the CLA already but the status is still pending? Let us recheck it.

This was referenced Nov 16, 2021
@alexey-milovidov alexey-milovidov merged commit 6b97af4 into ClickHouse:master Dec 17, 2021
/// It is needed for debug. See QueryPipelineProcessorsCollector.
Processors * collected_processors = nullptr;

Poco::Logger * log = &Poco::Logger::get("Pipe");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unused.


Processors aggregating;

Poco::Logger * log = &Poco::Logger::get("AggregatingStep");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also unused.

explicit ExtractedSettings(const Settings & settings_);
};

Poco::Logger * poco_log = &Poco::Logger::get("ExpressionAnalyzer");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused


UInt64 keys_vector_idx = 0;

Poco::Logger * log = &Poco::Logger::get("GroupingSetsTransform");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one is also unused.

@sevirov
Copy link
Copy Markdown
Contributor

sevirov commented Dec 17, 2021

Internal documentation ticket: DOCSUP-20353

Comment on lines +131 to +141
if (group_by_with_grouping_sets)
{
frame.surround_each_list_element_with_parens = true;
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << (s.one_line ? "" : " ") << "GROUPING SETS" << (s.hilite ? hilite_none : "");
s.ostr << " (";
s.one_line
? groupBy()->formatImpl(s, state, frame)
: groupBy()->as<ASTExpressionList &>().formatImplMultiline(s, state, frame);
s.ostr << ")";
frame.surround_each_list_element_with_parens = false;
}
Copy link
Copy Markdown
Member

@azat azat Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks odd:

SELECT
    key1,
    key2,
    sum(value)
FROM data
GROUP BY
    GROUPING SETS (
    ( key1),
    ( key2))

So looks like:

  • surround_each_list_element_with_parens should not be set
  • extra parenthesis is not required

So something like this:

if (group_by_with_grouping_sets)
{
    s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << (s.one_line ? "" : "    ") << "GROUPING SETS" << (s.hilite ? hilite_none : "");
    s.one_line
    ? groupBy()->formatImpl(s, state, frame)
    : groupBy()->as<ASTExpressionList &>().formatImplMultiline(s, state, frame);
}

And actually even newline between GROUP BY and GROUPING SETS looks reduntant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra parenthesis is not required

Why? This is part of GROUPING SETS syntax.

But I agree I should remove new line before GROUPING SETS and redundant whitespace before the first element in grouping set.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is part of GROUPING SETS syntax.

I don't see why they are required, both queries works fine:

  • SELECT key1, key2, sum(value) FROM data GROUP BY GROUPING SETS (( key1), ( key2))
  • SELECT key1, key2, sum(value) FROM data GROUP BY GROUPING SETS key1, key2

Plus GROUP BY keys are not surrounded with extra parenthesis.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because grouping sets support several grouping commands.
This is a valid syntax:

GROUPING SETS ((key1), (key2), (key1, key2), ())

It should work similar to Postgres.

Copy link
Copy Markdown
Member

@azat azat Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants