Skip to content

Add operator GROUPING SETS#24172

Closed
MaxTheHuman wants to merge 27 commits intoClickHouse:masterfrom
MaxTheHuman:grouping-sets-dev
Closed

Add operator GROUPING SETS#24172
MaxTheHuman wants to merge 27 commits intoClickHouse:masterfrom
MaxTheHuman:grouping-sets-dev

Conversation

@MaxTheHuman
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 16, 2021
@l1t1
Copy link
Copy Markdown

l1t1 commented May 17, 2021

nice job

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented May 24, 2021

The tests are missing, did you forget git add?

@KochetovNicolai KochetovNicolai self-assigned this May 25, 2021
break;

size_t first_column_to_add = (i == 0 ? 1 : 0);
auto adding_column_action = ActionsDAG::makeAddingColumnActions(pipeline.getHeader().getByPosition(params.keys_vector[first_column_to_add][0]));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's suspicious.
What if we have two sets with the same prefix?
For (a, b), (a, c) we add const column a for both, replacing normal column with default one.

expression_transforms.push_back(std::make_shared<ExpressionTransform>(aggregating_transforms[i]->getOutputs().front().getHeader(), actions));

}
pipeline.addParallelTransforms(std::move(aggregating_transforms));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same could be done with addSimpleTransform, cause aggregating one input and one output port.
But maybe this method will be useful later, when we remove pipeline.resize(1);

Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good.

Also, we need to remove all debug output

@taylor12805
Copy link
Copy Markdown
Contributor

taylor12805 commented Jul 8, 2021

I try to reproduce the test case from website: https://oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets

CREATE TABLE default.dimension_tab (fact_1_idInt32,fact_2_idInt32,fact_3_idInt32,fact_4_idInt32,sales_valueDecimal(9, 2) ) ENGINE = MergeTree ORDER BY tuple() SETTINGS index_granularity = 8192

INSERT INTO default.dimension_tab SELECT rand32(number) % 2 + 1 AS fact_1_id, rand32(number) % 5 + 1 AS fact_2_id, rand32(number) % 10 + 1 AS fact_3_id, rand32(number) % 10 + 1 AS fact_4_id, rand32(number) % 100 AS sales_value FROM system.numbers limit 1000

SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM default.dimension_tab GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id)) ORDER BY fact_1_id, fact_2_id, fact_3_id

In Oracle, the result is

FACT_1_ID FACT_2_ID FACT_3_ID SALES_VALUE GROUPING_ID


     1          1                4363.55           1
     1          2                4794.76           1
     1          3                4718.25           1
     1          4                5387.45           1
     1          5                5027.34           1
     1                     1      2737.4           2
     1                     2     1854.29           2
     1                     3     2090.96           2
     1                     4     2605.17           2
     1                     5     2590.93           2
     1                     6      2506.9           2
     1                     7     1839.85           2
     1                     8     2953.04           2
     1                     9     2778.75           2
     1                    10     2334.06           2
     2          1                5652.84           1
     2          2                4583.02           1
     2          3                5555.77           1
     2          4                5936.67           1
     2          5                4508.74           1
     2                     1     3512.69           2
     2                     2     2847.94           2
     2                     3      2972.5           2
     2                     4     2534.06           2
     2                     5     3115.99           2
     2                     6     2775.85           2
     2                     7     2208.19           2
     2                     8     2358.55           2
     2                     9     1884.11           2
     2                    10     2027.16           2

The result in this implementation:

FACT_1_ID FACT_2_ID FACT_3_ID SALES_VALUE GROUPING_ID


  0	        0      	1	4820
  0	        0      	2	5222
  0      	0      	3	5130
  0	        0	4	5426
  0      	0	5	4782
  0       	0	6	4115
  0       	0	7	5752
  0	        0	8	5385
  0      	0	9	4772
  0      	0	10	5533
  0      	1	0	8935
  0      	2	0	10974
  0      	3	0	10515
  0      	4	0	10198
  0       	5	0	10315
  1       	0	0	25256
  2      	0	0	25681

The result of this implementation is quite different from the result in Oracle. I suppose there is some issue with this implementation. Correct me if I am wrong

@taylor12805
Copy link
Copy Markdown
Contributor

@KochetovNicolai Hi, KochetovNicolai, I'm trying to fix grouping sets multiple elements related issue, i wonder if we should support no parenthesis
element inside grouping sets like: GROUPING SETS(A, (A, B), (A, C)) or should only contains elements inside parenthesis like: GROUPING SETS((A), (A, B), (A, C))

@KochetovNicolai
Copy link
Copy Markdown
Member

@taylor12805 Hi!
I think we need to support all the cases. Query with group by grouping sets (x, (x), (y, z), ()) works in Postrges, so I think our syntax should also support it.

I'm trying to fix grouping sets multiple elements related issue

Do you need some help with it? I think I may also finish this pr, but I would really appreciate any assistance from you.

@taylor12805
Copy link
Copy Markdown
Contributor

@taylor12805 Hi!
I think we need to support all the cases. Query with group by grouping sets (x, (x), (y, z), ()) works in Postrges, so I think our syntax should also support it.

I'm trying to fix grouping sets multiple elements related issue

Do you need some help with it? I think I may also finish this pr, but I would really appreciate any assistance from you.

@KochetovNicolai Hi!
I am just fixed the issue of multiple grouping sets not working correctly in AggregatingStep. There are still some remaining work in TreeRewriter part. How is urs? Let me know anywhere I can give a hand

@taylor12805 taylor12805 mentioned this pull request Jul 28, 2021
@sensatyaki
Copy link
Copy Markdown

What code changes have been done to handle the empty set with () parenthesis, at the parsing level? For example ((A),()).

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ KochetovNicolai
❌ MaxTheHuman
You have signed the CLA already but the status is still pending? Let us recheck it.

@novikd
Copy link
Copy Markdown
Member

novikd commented Dec 17, 2021

GROUPING SETS are added in #26869

@novikd novikd closed this Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants