feat: Support RightMark join for NestedLoop and Hash join#16083
feat: Support RightMark join for NestedLoop and Hash join#16083alamb merged 23 commits intoapache:mainfrom
Conversation
|
cc @comphead |
|
This is cool! I got some questions:
|
|
@2010YOUY01 I think i'lll try to get sql queries optimized into a right mark join after support for symmetric hash join + sort merge join. Right mark is equivalent to left mark however internally I didn't swap during planning time. I had just marked a column on the probe batch as probe batches were incoming. |
|
@2010YOUY01 @Dandandan Is it possible to take a look? Thanks! |
|
Really cool work! I'll look over it in more detail over the weekend =) |
|
After updating Details |
comphead
left a comment
There was a problem hiding this comment.
Thanks @jonathanc-n I quickly went through the PR
It would be nice to have fuzz tests for RightMark join, I'm not sure 3 tests would cover all possible cases, WDYT?
|
Alrighty! Regarding the lack of support for RightMark joins in some join operators, I believe it would be best to return an error in the constructor of those operators if they do not support the JoinType. I believe the PR currently would just return a wrong result. I don't (yet) know how much work it would be to adjust physical optimizer rules so that no bad JoinType x OperatorType combination gets constructed. Algorithmically, what I found hardest to understand is the switcheroo between build and probe side - You've modified I think a less confusing alternative would be to just add a |
I think instead we should just not support swapping for now and support it after all the join operators are supported. Returning an error will cause some tests to fail. |
| | JoinType::LeftMark => { | ||
| | JoinType::LeftMark | ||
| | JoinType::RightMark => { |
There was a problem hiding this comment.
I admit that I don't understand why LeftMark/RightMark are different from their Semi/Anti counterparts. It seems like the documentation of this method is missing LeftMark / RightMark too
There was a problem hiding this comment.
Added the doc! I'm not quite sure either, I think it has to do with the extra boolean column crossing the boundary into the other side, resulting in the mark column being missing from the requirements
| let mut bitmap = BooleanBufferBuilder::new(range.len()); | ||
| bitmap.append_n(range.len(), false); | ||
| input_indices | ||
| .iter() | ||
| .flatten() | ||
| .map(|v| v.as_usize()) | ||
| .filter(|v| range.contains(v)) | ||
| .for_each(|v| { | ||
| bitmap.set_bit(v - range.start, true); | ||
| }); |
There was a problem hiding this comment.
I noticed that these lines are the same for get_mark_indices, get_semi_indices and get_anti_indices. We could factor them out to a new method?
There was a problem hiding this comment.
map.filter.foreach wondering can it be done in single iteration?
There was a problem hiding this comment.
.for_each(|v| {
let idx = v.as_usize();
if range.contains(&idx) {
builder.set_bit(idx - range.start, true);
}
});
Could do something like this?
There was a problem hiding this comment.
I havI have refactored to use this
Co-authored-by: Christian <[email protected]>
| .await | ||
| } | ||
|
|
||
| // todo: add JoinTestType::HjSmj after Right mark SortMergeJoin support |
There was a problem hiding this comment.
It is needed to add RightMark Join support to SortMergeJoin execution as well
There was a problem hiding this comment.
Oh I will close #16226 as it is a duplicate
| .collect::<PrimitiveArray<L>>(); | ||
| let probe_indices = (0..prune_length) | ||
| .map(|idx| { | ||
| // For mark join we output a dummy index 0 to indicate the row had a match |
There was a problem hiding this comment.
that would be nice to show as an example, it is challenging to read algorithms for upcoming contributors, but it can be done as follow up PR
comphead
left a comment
There was a problem hiding this comment.
Thanks @jonathanc-n for bearing with this PR, I think it is good to go.
|
Thanks for this contribution, I'm planning to have this PR open for a little bit of more time to see if there are any other feedbacks |
|
🚀 -- I am feeling physically nervous that there are so many PRs open so starting the merge train! |
Which issue does this PR close?
Part of #13138 .
Rationale for this change
Revamp implementation of the previous stale implementation for RightMark
What changes are included in this PR?
Added support for right mark join functionality to NestedLoop + Hash join. A follow up pr will be made for supporting right mark in Sortmergejoin and sym hash join.
Are these changes tested?
Yes unit tests