[RFC] use statistic to order prewhere conditions better#53240
Merged
hanfei1991 merged 39 commits intoClickHouse:masterfrom Nov 29, 2023
Merged
[RFC] use statistic to order prewhere conditions better#53240hanfei1991 merged 39 commits intoClickHouse:masterfrom
hanfei1991 merged 39 commits intoClickHouse:masterfrom
Conversation
Contributor
|
This is an automated comment for commit e4421e2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
91b7428 to
f9abf16
Compare
b332676 to
d6c1c0e
Compare
CurtizJ
reviewed
Oct 24, 2023
1 task
Contributor
|
hi @hanfei1991 I am interesting with your PR. I have some questions, expecting for your answers.
|
Member
Author
|
Contributor
|
thanks |
1 task
Member
Author
|
|
Contributor
|
@hanfei1991 I am going to add more statistic types. There is a question: How to support multi types of statistics such as hyperloglog, cm-sketch in SQL? solution 2(Incompatible with the current version) solution 3 I prefer the 3td one, which do you prefer or you have a better one? |
Member
Author
|
the 1st one is a amendment for the 3rd one @JackyWoo |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is statistic
an explaination from SQL Server https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
We use
tdigestas a pratical histogramHow to create / manipulate statistic in ClickHouse
we treat statistic as a property of a table, such as
CODEC,TTL...but we store and manipulate statistic seperately, such as
INDEX,PROJECTIONcreate
manipulate
How to store it in a part
we store a single file containing all types of statistics for every column which has statistic.
and
how do we use statistic in where optimizer
prewhere conditionis likea < 5 and b > 1 and c < 4.0, then we try to re-order them accroding to the selectivitymore works need to do for this PR:
lessThanforTDigesthope that I could control the + lines within 2000 😺
what to do in the future
#55065
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
use statistic to order prewhere conditions better
Documentation entry for user-facing changes