Skip to content

Comments

[SPARK-35984][SQL][TEST] Config to force applying shuffled hash join#33182

Closed
linhongliu-db wants to merge 5 commits intoapache:masterfrom
linhongliu-db:SPARK-35984-hash-join-config
Closed

[SPARK-35984][SQL][TEST] Config to force applying shuffled hash join#33182
linhongliu-db wants to merge 5 commits intoapache:masterfrom
linhongliu-db:SPARK-35984-hash-join-config

Conversation

@linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Jul 2, 2021

What changes were proposed in this pull request?

Add a config spark.sql.join.forceApplyShuffledHashJoin to force applying shuffled hash join
during the join selection.

Why are the changes needed?

In the SQLQueryTestSuite, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But even
if the spark.sql.join.preferSortMergeJoin is set to false, shuffled hash join is still not guaranteed.
Thus, we need another config to force the selection.

Does this PR introduce any user-facing change?

No, only for testing

How was this patch tested?

newly added tests
Verified all queries in join.sql will use ShuffledHashJoin when the config set to true

@github-actions github-actions bot added the SQL label Jul 2, 2021
@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Test build #140559 has finished for PR 33182 at commit 16a0791.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Test build #140556 has finished for PR 33182 at commit 24b39a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we are on 3.3.0 now I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PREFER_SORTMERGEJOIN.key instead of spark.sql.join.perferSortMergejoin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want user to use this config, and this should be only taking effect in testing right? Should we add condition e.g. Utils.isTesting?

@linhongliu-db linhongliu-db force-pushed the SPARK-35984-hash-join-config branch from 16a0791 to f3474a2 Compare July 3, 2021 09:00
@SparkQA
Copy link

SparkQA commented Jul 3, 2021

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Test build #140613 has finished for PR 33182 at commit f3474a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.booleanConf
.createWithDefault(true)

val FORCE_APPLY_SHUFFLEDHASHJOIN = buildConf("spark.sql.join.forceApplyShuffledHashJoin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just hardcode test-only configs.

@SparkQA
Copy link

SparkQA commented Jul 6, 2021

@SparkQA
Copy link

SparkQA commented Jul 6, 2021

(!conf.preferSortMergeJoin && canBuildLocalHashMapBySize(left, conf) &&
muchSmaller(left, right))
muchSmaller(left, right)) ||
(Utils.isTesting && forceApplyShuffledHashJoin(conf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can even move Utils.isTesting into forceApplyShuffledHashJoin

@SparkQA
Copy link

SparkQA commented Jul 6, 2021

Test build #140701 has finished for PR 33182 at commit d0dfd8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2021

@SparkQA
Copy link

SparkQA commented Jul 6, 2021

@cloud-fan cloud-fan changed the title [SPARK-35984][SQL] Config to force applying shuffled hash join [SPARK-35984][SQL][TEST] Config to force applying shuffled hash join Jul 6, 2021
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2 (to improve test coverage)

@cloud-fan cloud-fan closed this in 7566db6 Jul 6, 2021
cloud-fan pushed a commit that referenced this pull request Jul 6, 2021
### What changes were proposed in this pull request?
Add a config `spark.sql.join.forceApplyShuffledHashJoin` to force applying shuffled hash join
during the join selection.

### Why are the changes needed?
In the `SQLQueryTestSuite`, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But even
if the `spark.sql.join.preferSortMergeJoin` is set to `false`, shuffled hash join is still not guaranteed.
Thus, we need another config to force the selection.

### Does this PR introduce _any_ user-facing change?
No, only for testing

### How was this patch tested?
newly added tests
Verified all queries in join.sql will use `ShuffledHashJoin` when the config set to `true`

Closes #33182 from linhongliu-db/SPARK-35984-hash-join-config.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 7566db6)
Signed-off-by: Wenchen Fan <[email protected]>
@SparkQA
Copy link

SparkQA commented Jul 6, 2021

Test build #140710 has finished for PR 33182 at commit 81fdaae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon pushed a commit that referenced this pull request Jul 7, 2021
…in test in-joins.sql

### What changes were proposed in this pull request?

We found the `in-join.sql` does not test shuffled hash join properly in https://issues.apache.org/jira/browse/SPARK-32577, but didn't find a good way to fix it. Given we now have a test config to enforce shuffled hash join in #33182, we can fix the test here now as well.

### Why are the changes needed?

Fix test to have better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Reran the test to compare the output, and verified the query plan manually to make sure shuffled hash join being used.

Closes #33236 from c21/join-test.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Jul 7, 2021
…in test in-joins.sql

### What changes were proposed in this pull request?

We found the `in-join.sql` does not test shuffled hash join properly in https://issues.apache.org/jira/browse/SPARK-32577, but didn't find a good way to fix it. Given we now have a test config to enforce shuffled hash join in #33182, we can fix the test here now as well.

### Why are the changes needed?

Fix test to have better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Reran the test to compare the output, and verified the query plan manually to make sure shuffled hash join being used.

Closes #33236 from c21/join-test.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit f3c1159)
Signed-off-by: Hyukjin Kwon <[email protected]>
wangyum pushed a commit that referenced this pull request May 26, 2023
Add a config `spark.sql.join.forceApplyShuffledHashJoin` to force applying shuffled hash join
during the join selection.

In the `SQLQueryTestSuite`, we want to cover 3 kinds of join (BHJ, SHJ, SMJ) in join.sql. But even
if the `spark.sql.join.preferSortMergeJoin` is set to `false`, shuffled hash join is still not guaranteed.
Thus, we need another config to force the selection.

No, only for testing

newly added tests
Verified all queries in join.sql will use `ShuffledHashJoin` when the config set to `true`

Closes #33182 from linhongliu-db/SPARK-35984-hash-join-config.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants