feat: add optimizer config param to avoid grouping partitions prefer_existing_union#10259
feat: add optimizer config param to avoid grouping partitions prefer_existing_union#10259alamb merged 3 commits intoapache:mainfrom
prefer_existing_union#10259Conversation
| plan, | ||
| !first_enforce_distribution, | ||
| prefer_existing_sort, | ||
| prefer_existing_union |
There was a problem hiding this comment.
Observing the test union_to_interleave above it. If it covers all needed cases, these 2 tests will cover all needed cases for the not-convert, too. Let me know if it is not the case
|
@alamb Is this PR is good enough? We do not need to set |
| plan = if plan.as_any().is::<UnionExec>() && can_interleave(children_plans.iter()) { | ||
|
|
||
| plan = if plan.as_any().is::<UnionExec>() | ||
| && !config.optimizer.prefer_existing_union |
There was a problem hiding this comment.
Given that one of the motivations (influxdata#4) for this new flag is to preserve the sorting of the Union - would re-using the existing flag prefer_existing_sort make sense here?
One argument against that would be if you wanted prefer_existing_sort: false and prefer_existing_union: true.
There was a problem hiding this comment.
Given that one of the motivations (influxdata#4) for this new flag is to preserve the sorting of the Union - would re-using the existing flag
prefer_existing_sortmake sense here?One argument against that would be if you wanted
prefer_existing_sort: falseandprefer_existing_union: true.
This is an excellent point @phillipleblanc . I just double checked and we actually have prefer_existing_sort set to true in IOx already (code ref, not public)
What do you think about using the existing flag @NGA-TRAN ?
There was a problem hiding this comment.
@phillipleblanc and @alamb
@mustafasrepo 's comment #10259 (comment) suggest to use the current approach. Do I need to do anything more here?
There was a problem hiding this comment.
Nope, I think this approach looks good! Thanks!
alamb
left a comment
There was a problem hiding this comment.
Thanks @NGA-TRAN -- this is looking like it is pretty close in my opinion
I think @phillipleblanc 's suggestions are quite good and we should consider them carefully
| plan = if plan.as_any().is::<UnionExec>() && can_interleave(children_plans.iter()) { | ||
|
|
||
| plan = if plan.as_any().is::<UnionExec>() | ||
| && !config.optimizer.prefer_existing_union |
There was a problem hiding this comment.
Given that one of the motivations (influxdata#4) for this new flag is to preserve the sorting of the Union - would re-using the existing flag
prefer_existing_sortmake sense here?One argument against that would be if you wanted
prefer_existing_sort: falseandprefer_existing_union: true.
This is an excellent point @phillipleblanc . I just double checked and we actually have prefer_existing_sort set to true in IOx already (code ref, not public)
What do you think about using the existing flag @NGA-TRAN ?
|
|
||
| ($EXPECTED_LINES: expr, $PLAN: expr, $FIRST_ENFORCE_DIST: expr, $PREFER_EXISTING_SORT: expr) => { | ||
| assert_optimized!($EXPECTED_LINES, $PLAN, $FIRST_ENFORCE_DIST, $PREFER_EXISTING_SORT, 10, false, 1024); | ||
| assert_optimized!($EXPECTED_LINES, $PLAN, $FIRST_ENFORCE_DIST, $PREFER_EXISTING_SORT, 10, false, 1024, false); |
There was a problem hiding this comment.
these macros are getting a bit hairy -- maybe we can clean them up (convert to using functions rather than macros) in a subsequent PR
|
cc @mustafasrepo and @ozankabak for your thoughts |
|
I think having dedicated config setting is more verbose and clear (as in plan.as_any().is::<UnionExec>()
&& !config.optimizer.prefer_existing_sort
&& can_interleave(children_plans.iter())this will prefer In the future, if we add support for |
Sounds good to me.
This is an excellent idea, I will file a ticket to track the idea @NGA-TRAN can you get the tests passing on this PR and we'll give it another review? |
|
@alamb @phillipleblanc and @mustafasrepo |
prefer_existing_union
alamb
left a comment
There was a problem hiding this comment.
This PR looks good to me.
Thank you for the reviews @phillipleblanc @mustafasrepo and @ozankabak
I plan to file the follow on tickets (e.g. about sort preserving interleave) tomorrow
mustafasrepo
left a comment
There was a problem hiding this comment.
This PR is LGTM! Thanks @NGA-TRAN
| plan = if plan.as_any().is::<UnionExec>() && can_interleave(children_plans.iter()) { | ||
|
|
||
| plan = if plan.as_any().is::<UnionExec>() | ||
| && !config.optimizer.prefer_existing_union |
There was a problem hiding this comment.
Nope, I think this approach looks good! Thanks!
|
Since this PR is good to go, merging it in. I am also in the process of filing the follow on work |
…_existing_union` (apache#10259) * feat: add a config param to avoid converting union to interleave * chore: update config for the tests * chore: update configs.md
|
Filed #10314 to track order preserving UnionExec |
|
I also started collecting sorted related optimizations in #10313 as well |
…_existing_union` (apache#10259) * feat: add a config param to avoid converting union to interleave * chore: update config for the tests * chore: update configs.md
…_existing_union` (apache#10259) * feat: add a config param to avoid converting union to interleave * chore: update config for the tests * chore: update configs.md
Which issue does this PR close?
Closes ##10257
Rationale for this change
What changes are included in this PR?
Add an option to avoid converting Union to Interleave
Are these changes tested?
Yes
Are there any user-facing changes?
No, because by the fault, nothing is changed