Skip to content

Comments

[SPARK-44060][SQL] Code-gen for build side outer shuffled hash join#41614

Closed
szehon-ho wants to merge 8 commits intoapache:masterfrom
szehon-ho:same_side_outer_join_codegen_master
Closed

[SPARK-44060][SQL] Code-gen for build side outer shuffled hash join#41614
szehon-ho wants to merge 8 commits intoapache:masterfrom
szehon-ho:same_side_outer_join_codegen_master

Conversation

@szehon-ho
Copy link
Member

What changes were proposed in this pull request?

Codegen of shuffled hash join of build side outer join (ie, left outer join build left or right outer join build right)

Why are the changes needed?

The implementation of #41398 was only for non-codegen version, and codegen was disabled in this scenario.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit test in WholeStageCodegenSuite

@github-actions github-actions bot added the SQL label Jun 15, 2023
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't entirely sure if this is the right approach, but the existing flag was a bit explicit in FULL OUTER JOIN, and in the discussion like #41398 (comment) build-side outer join is taken to mean, if only one side is OUTER.

Copy link
Member Author

@szehon-ho szehon-ho Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is not strictly necessary, but wanted to keep coverage of the non-codegen codepath.

However, it is true that the "SPARK-32399: Full outer shuffled hash join" test does not do this, iiuc leaving the non-codegen codepath actually untested for full-outer-join case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for keeping the test coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we test both the codegen and non-codegen codepath? This test seems to have more coverage than the test in WholeStageCodegenSuite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let me make do both then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c21 Shall we have a follow-up to also test non-codegen codepath in "SPARK-32399: Full outer shuffled hash join" test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I think that'd be valuable, I could add that too , to the test above, in a follow up pr

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR again, @szehon-ho .

Also, cc @viirya , @huaxingao , @sunchao , @cloud-fan , too

@huaxingao
Copy link
Contributor

+1, LGTM from my side. Thanks for working on this! @szehon-ho
Let's wait a couple of days for others to review.

Comment on lines 595 to 599
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems isFullOuterJoin could be pull to outside if?

if (!$foundMatch && $isFullOuterJoin) {
  $buildRow = null;
  $consumeFullOuterJoinRow();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was afraid of side effect if I don't set buildRow to null, if not fullOuterJoin. Let me check.

Copy link
Member Author

@szehon-ho szehon-ho Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave buildRow=null in both case if !foundMatch. I was thinking its cleaner, to clear the state, and more in line with uniqueKey side. Let me know what you think.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay but some minor comments.

@szehon-ho szehon-ho force-pushed the same_side_outer_join_codegen_master branch from cee52fe to 4213175 Compare June 27, 2023 05:32
@szehon-ho
Copy link
Member Author

rebase to try to fix test error

@szehon-ho
Copy link
Member Author

@viirya if you have some time could you take another look? Replied but didnt resolve to the last comment, let me know your thoughts. Thanks!

@viirya
Copy link
Member

viirya commented Jun 27, 2023

Thanks @szehon-ho for addressing the comments. I will take another look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consumeOuterJoinRow.

 ### What changes were proposed in this pull request?
Codegen of shuffled hash join of build side outer join (ie, left outer join build left or right outer join build right)

 ### Why are the changes needed?

The implementation of apache#41398 was only for non-codegen version, and codegen was disabled in this scenario.

 ### Does this PR introduce _any_ user-facing change?

No

 ### How was this patch tested?

New unit test in WholeStageCodegenSuite
@szehon-ho szehon-ho force-pushed the same_side_outer_join_codegen_master branch from 4a060da to f9dccff Compare June 30, 2023 17:32
@szehon-ho
Copy link
Member Author

Trying again, test failure didn't look very related though not entirely sure.

2023-06-30T06:10:24.5434667Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m- error notifier test *** FAILED *** (1 minute)�[0m�[0m
2023-06-30T06:10:24.5436553Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  java.nio.file.NoSuchFileException: /home/runner/work/spark/spark/target/tmp/spark-cc3d10d8-5d0b-4592-9971-74bba230b8a8�[0m�[0m
2023-06-30T06:10:24.5438354Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)�[0m�[0m
2023-06-30T06:10:24.5439946Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)�[0m�[0m
2023-06-30T06:10:24.5441984Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)�[0m�[0m
...
2023-06-30T06:41:10.1873780Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m*** 1 TEST FAILED ***�[0m�[0m
2023-06-30T06:41:10.2104588Z �[0m[�[0m�[31merror�[0m] �[0m�[0mFailed: Total 9312, Failed 1, Errors 0, Passed 9311, Ignored 27�[0m
2023-06-30T06:41:10.2332475Z �[0m[�[0m�[31merror�[0m] �[0m�[0mFailed tests:�[0m
2023-06-30T06:41:10.2334292Z �[0m[�[0m�[31merror�[0m] �[0m�[0m	org.apache.spark.sql.execution.streaming.MicroBatchExecutionSuite�[0m

@viirya
Copy link
Member

viirya commented Jun 30, 2023

Seems irrelevant.

@huaxingao huaxingao closed this in 2db8cfb Jul 1, 2023
@huaxingao
Copy link
Contributor

Merged to master! Thanks @szehon-ho @viirya @dongjoon-hyun

| }
|
| $consumeFullOuterJoinRow();
| if ($foundMatch || $isFullOuterJoin) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can statically remove or simply the condition:

val consumeCode = if (isFullOuterJoin) {
  s"$consumeOuterJoinRow();"
} else {
  s"""
    |if ($foundMatch) {
    |  $consumeOuterJoinRow();
    |}
  """
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , let me make some follow ups a bit later

@cloud-fan
Copy link
Contributor

late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants