Skip to content

Auto distributed_group_by_no_merge on GROUP BY sharding key#10341

Merged
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
azat:auto_distributed_group_by_no_merge
Apr 20, 2020
Merged

Auto distributed_group_by_no_merge on GROUP BY sharding key#10341
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
azat:auto_distributed_group_by_no_merge

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Apr 17, 2020

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Auto distributed_group_by_no_merge on GROUP BY sharding key (if optimize_skip_unused_shards is set)

Fixes: #332

@blinkov blinkov added the pr-improvement Pull request with some product improvements label Apr 17, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Apr 18, 2020

What I'm wondering before looking into code... is that distributed_group_by_no_merge just sets query processing stage to Complete instead of WithMergeableState and it works not only for GROUP BY queries, but will also prevent merging of streams for queries with ORDER BY and without GROUP BY.

Let's check that.

@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 18, 2020

prevent merging of streams for queries with ORDER BY

Indeed, but there is if (select.orderBy()) return false and test

@alexey-milovidov
Copy link
Copy Markdown
Member

What if there are other operations that require merging, e.g.
LIMIT - calculates preliminary limit on shards, apply final LIMIT on initiator;
DISTINCT - the same...
?

@azat azat force-pushed the auto_distributed_group_by_no_merge branch from 534a365 to 034152a Compare April 18, 2020 15:02
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Apr 18, 2020

Ok. The code becomes complicated and bug-prone but still acceptable.

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

(we can experiment in a separate PR)

@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 18, 2020

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

Yeah, I was just "experimenting"
Another stage is a great idea, will take a look (even distinct by shading key is ok actually)

@azat azat force-pushed the auto_distributed_group_by_no_merge branch from 034152a to 35eec0a Compare April 19, 2020 15:29
@azat azat force-pushed the auto_distributed_group_by_no_merge branch from 35eec0a to 93d049f Compare April 19, 2020 16:19
@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 19, 2020

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

Decided to address this separately (to avoid polluting this patch set, since I guess there will be some corner cases that I will forget about)

@alexey-milovidov
Copy link
Copy Markdown
Member

Ok.

@azat
Copy link
Copy Markdown
Member Author

azat commented Aug 20, 2020

Maybe we can solve the task by introducing another QueryProcessingStage: WithMergeableStateAfterAggregation (it means that aggregate functions were calculated and finalized) and process it inside InterpreterSelectQuery?

@alexey-milovidov #10373 implements this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FR: count unique values only locally if it is known that they do not overlap between servers

3 participants