[SPARK-35984][SQL][TEST] Config to force applying shuffled hash join#33182
[SPARK-35984][SQL][TEST] Config to force applying shuffled hash join#33182linhongliu-db wants to merge 5 commits intoapache:masterfrom
Conversation
|
cc @cloud-fan |
|
Test build #140559 has finished for PR 33182 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test status success |
|
Test build #140556 has finished for PR 33182 at commit
|
There was a problem hiding this comment.
nit: we are on 3.3.0 now I think?
There was a problem hiding this comment.
nit: PREFER_SORTMERGEJOIN.key instead of spark.sql.join.perferSortMergejoin.
There was a problem hiding this comment.
I think we don't want user to use this config, and this should be only taking effect in testing right? Should we add condition e.g. Utils.isTesting?
16a0791 to
f3474a2
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140613 has finished for PR 33182 at commit
|
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val FORCE_APPLY_SHUFFLEDHASHJOIN = buildConf("spark.sql.join.forceApplyShuffledHashJoin") |
There was a problem hiding this comment.
we can just hardcode test-only configs.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| (!conf.preferSortMergeJoin && canBuildLocalHashMapBySize(left, conf) && | ||
| muchSmaller(left, right)) | ||
| muchSmaller(left, right)) || | ||
| (Utils.isTesting && forceApplyShuffledHashJoin(conf)) |
There was a problem hiding this comment.
nit: we can even move Utils.isTesting into forceApplyShuffledHashJoin
|
Test build #140701 has finished for PR 33182 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
thanks, merging to master/3.2 (to improve test coverage) |
### What changes were proposed in this pull request? Add a config `spark.sql.join.forceApplyShuffledHashJoin` to force applying shuffled hash join during the join selection. ### Why are the changes needed? In the `SQLQueryTestSuite`, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But even if the `spark.sql.join.preferSortMergeJoin` is set to `false`, shuffled hash join is still not guaranteed. Thus, we need another config to force the selection. ### Does this PR introduce _any_ user-facing change? No, only for testing ### How was this patch tested? newly added tests Verified all queries in join.sql will use `ShuffledHashJoin` when the config set to `true` Closes #33182 from linhongliu-db/SPARK-35984-hash-join-config. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 7566db6) Signed-off-by: Wenchen Fan <[email protected]>
|
Test build #140710 has finished for PR 33182 at commit
|
…in test in-joins.sql ### What changes were proposed in this pull request? We found the `in-join.sql` does not test shuffled hash join properly in https://issues.apache.org/jira/browse/SPARK-32577, but didn't find a good way to fix it. Given we now have a test config to enforce shuffled hash join in #33182, we can fix the test here now as well. ### Why are the changes needed? Fix test to have better test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Reran the test to compare the output, and verified the query plan manually to make sure shuffled hash join being used. Closes #33236 from c21/join-test. Authored-by: Cheng Su <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…in test in-joins.sql ### What changes were proposed in this pull request? We found the `in-join.sql` does not test shuffled hash join properly in https://issues.apache.org/jira/browse/SPARK-32577, but didn't find a good way to fix it. Given we now have a test config to enforce shuffled hash join in #33182, we can fix the test here now as well. ### Why are the changes needed? Fix test to have better test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Reran the test to compare the output, and verified the query plan manually to make sure shuffled hash join being used. Closes #33236 from c21/join-test. Authored-by: Cheng Su <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit f3c1159) Signed-off-by: Hyukjin Kwon <[email protected]>
Add a config `spark.sql.join.forceApplyShuffledHashJoin` to force applying shuffled hash join during the join selection. In the `SQLQueryTestSuite`, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But even if the `spark.sql.join.preferSortMergeJoin` is set to `false`, shuffled hash join is still not guaranteed. Thus, we need another config to force the selection. No, only for testing newly added tests Verified all queries in join.sql will use `ShuffledHashJoin` when the config set to `true` Closes #33182 from linhongliu-db/SPARK-35984-hash-join-config. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Add a config
spark.sql.join.forceApplyShuffledHashJointo force applying shuffled hash joinduring the join selection.
Why are the changes needed?
In the
SQLQueryTestSuite, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But evenif the
spark.sql.join.preferSortMergeJoinis set tofalse, shuffled hash join is still not guaranteed.Thus, we need another config to force the selection.
Does this PR introduce any user-facing change?
No, only for testing
How was this patch tested?
newly added tests
Verified all queries in join.sql will use
ShuffledHashJoinwhen the config set totrue