[SPARK-44060][SQL] Code-gen for build side outer shuffled hash join#41614
[SPARK-44060][SQL] Code-gen for build side outer shuffled hash join#41614szehon-ho wants to merge 8 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
I wasn't entirely sure if this is the right approach, but the existing flag was a bit explicit in FULL OUTER JOIN, and in the discussion like #41398 (comment) build-side outer join is taken to mean, if only one side is OUTER.
There was a problem hiding this comment.
Note: this is not strictly necessary, but wanted to keep coverage of the non-codegen codepath.
However, it is true that the "SPARK-32399: Full outer shuffled hash join" test does not do this, iiuc leaving the non-codegen codepath actually untested for full-outer-join case.
There was a problem hiding this comment.
+1 for keeping the test coverage.
There was a problem hiding this comment.
Shall we test both the codegen and non-codegen codepath? This test seems to have more coverage than the test in WholeStageCodegenSuite.
There was a problem hiding this comment.
OK let me make do both then
There was a problem hiding this comment.
@c21 Shall we have a follow-up to also test non-codegen codepath in "SPARK-32399: Full outer shuffled hash join" test?
There was a problem hiding this comment.
Done! I think that'd be valuable, I could add that too , to the test above, in a follow up pr
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for making a PR again, @szehon-ho .
Also, cc @viirya , @huaxingao , @sunchao , @cloud-fan , too
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
Outdated
Show resolved
Hide resolved
|
+1, LGTM from my side. Thanks for working on this! @szehon-ho |
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Seems isFullOuterJoin could be pull to outside if?
if (!$foundMatch && $isFullOuterJoin) {
$buildRow = null;
$consumeFullOuterJoinRow();
}There was a problem hiding this comment.
I was afraid of side effect if I don't set buildRow to null, if not fullOuterJoin. Let me check.
There was a problem hiding this comment.
I wanted to leave buildRow=null in both case if !foundMatch. I was thinking its cleaner, to clear the state, and more in line with uniqueKey side. Let me know what you think.
viirya
left a comment
There was a problem hiding this comment.
Looks okay but some minor comments.
cee52fe to
4213175
Compare
|
rebase to try to fix test error |
|
@viirya if you have some time could you take another look? Replied but didnt resolve to the last comment, let me know your thoughts. Thanks! |
|
Thanks @szehon-ho for addressing the comments. I will take another look. |
### What changes were proposed in this pull request? Codegen of shuffled hash join of build side outer join (ie, left outer join build left or right outer join build right) ### Why are the changes needed? The implementation of apache#41398 was only for non-codegen version, and codegen was disabled in this scenario. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New unit test in WholeStageCodegenSuite
4a060da to
f9dccff
Compare
|
Trying again, test failure didn't look very related though not entirely sure. |
|
Seems irrelevant. |
|
Merged to master! Thanks @szehon-ho @viirya @dongjoon-hyun |
| | } | ||
| | | ||
| | $consumeFullOuterJoinRow(); | ||
| | if ($foundMatch || $isFullOuterJoin) { |
There was a problem hiding this comment.
nit: we can statically remove or simply the condition:
val consumeCode = if (isFullOuterJoin) {
s"$consumeOuterJoinRow();"
} else {
s"""
|if ($foundMatch) {
| $consumeOuterJoinRow();
|}
"""
}
There was a problem hiding this comment.
Thanks , let me make some follow ups a bit later
|
late LGTM |
What changes were proposed in this pull request?
Codegen of shuffled hash join of build side outer join (ie, left outer join build left or right outer join build right)
Why are the changes needed?
The implementation of #41398 was only for non-codegen version, and codegen was disabled in this scenario.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit test in WholeStageCodegenSuite