Auto distributed_group_by_no_merge on GROUP BY sharding key by azat · Pull Request #10341 · ClickHouse/ClickHouse

azat · 2020-04-17T23:20:47Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Auto distributed_group_by_no_merge on GROUP BY sharding key (if optimize_skip_unused_shards is set)

Fixes: #332

alexey-milovidov · 2020-04-18T01:20:53Z

What I'm wondering before looking into code... is that distributed_group_by_no_merge just sets query processing stage to Complete instead of WithMergeableState and it works not only for GROUP BY queries, but will also prevent merging of streams for queries with ORDER BY and without GROUP BY.

Let's check that.

azat · 2020-04-18T07:18:49Z

prevent merging of streams for queries with ORDER BY

Indeed, but there is if (select.orderBy()) return false and test

alexey-milovidov · 2020-04-18T11:33:15Z

What if there are other operations that require merging, e.g.
LIMIT - calculates preliminary limit on shards, apply final LIMIT on initiator;
DISTINCT - the same...
?

alexey-milovidov · 2020-04-18T18:01:11Z

Ok. The code becomes complicated and bug-prone but still acceptable.

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

(we can experiment in a separate PR)

azat · 2020-04-18T18:06:57Z

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

Yeah, I was just "experimenting"
Another stage is a great idea, will take a look (even distinct by shading key is ok actually)

… sharding key

…sharding key

…p_by_no_merge optimization

…istributed

…ted::canForceGroupByNoMerge()

azat · 2020-04-19T22:23:34Z

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

Decided to address this separately (to avoid polluting this patch set, since I guess there will be some corner cases that I will forget about)

alexey-milovidov · 2020-04-19T22:27:46Z

Ok.

azat · 2020-08-20T18:39:02Z

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

@alexey-milovidov #10373 implements this

blinkov added the pr-improvement Pull request with some product improvements label Apr 17, 2020

azat force-pushed the auto_distributed_group_by_no_merge branch from 534a365 to 034152a Compare April 18, 2020 15:02

azat force-pushed the auto_distributed_group_by_no_merge branch from 034152a to 35eec0a Compare April 19, 2020 15:29

azat added 3 commits April 19, 2020 18:33

Cover distributed_group_by_no_merge on GROUP BY injective function of…

6f76f27

… sharding key

Auto distributed_group_by_no_merge on GROUP BY injective function of …

de4a723

…sharding key

Allow auto distributed_group_by_no_merge for DISTINCT of sharding key

93d049f

azat force-pushed the auto_distributed_group_by_no_merge branch from 35eec0a to 93d049f Compare April 19, 2020 16:19

azat added 3 commits April 19, 2020 20:52

Fix 01213_optimize_skip_unused_shards_DISTINCT after distributed_grou…

681034f

…p_by_no_merge optimization

Fix distributed_group_by_no_merge optimization for Distributed-over-D…

be1dec9

…istributed

Fix clang readability-container-size-empty warning in StorageDistribu…

e44d5c5

…ted::canForceGroupByNoMerge()

alexey-milovidov merged commit 1577d77 into ClickHouse:master Apr 20, 2020

azat deleted the auto_distributed_group_by_no_merge branch April 20, 2020 07:51

azat mentioned this pull request Apr 20, 2020

Optimize queries with LIMIT/LIMIT BY/ORDER BY for distributed with GROUP BY sharding_key #10373

Merged

azat mentioned this pull request Apr 26, 2020

Disable GROUP BY sharding_key optimization by default (and fix for WITH ROLLUP/CUBE/TOTALS) #10516

Merged

azat mentioned this pull request Jun 15, 2021

Customizable query block for distributed engine. #5986

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto distributed_group_by_no_merge on GROUP BY sharding key#10341

Auto distributed_group_by_no_merge on GROUP BY sharding key#10341
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
azat:auto_distributed_group_by_no_merge

azat commented Apr 17, 2020

Uh oh!

alexey-milovidov commented Apr 18, 2020 •

edited

Loading

Uh oh!

azat commented Apr 18, 2020 •

edited

Loading

Uh oh!

alexey-milovidov commented Apr 18, 2020

Uh oh!

alexey-milovidov commented Apr 18, 2020 •

edited

Loading

Uh oh!

azat commented Apr 18, 2020

Uh oh!

azat commented Apr 19, 2020

Uh oh!

alexey-milovidov commented Apr 19, 2020

Uh oh!

azat commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

azat commented Apr 17, 2020

Uh oh!

alexey-milovidov commented Apr 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat commented Apr 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented Apr 18, 2020

Uh oh!

alexey-milovidov commented Apr 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat commented Apr 18, 2020

Uh oh!

azat commented Apr 19, 2020

Uh oh!

alexey-milovidov commented Apr 19, 2020

Uh oh!

azat commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexey-milovidov commented Apr 18, 2020 •

edited

Loading

azat commented Apr 18, 2020 •

edited

Loading

alexey-milovidov commented Apr 18, 2020 •

edited

Loading