Make swap_hash_join public API#10702
Conversation
| fn swap_hash_join( | ||
| /// This function is public so other downstream projects can use it | ||
| /// to construct `HashJoinExec` with right side as the build side. | ||
| pub fn swap_hash_join( |
There was a problem hiding this comment.
Spark supports build right in HashJoin operator internally. In planning stage, it decides which side is build side.
However, DataFusion HashJoin planning takes alternative approach. DataFusion HashJoin operator only supports build left. In optimization stage, once DataFusion finds build right is better, it will call this function to swap inputs for HashJoin to achieve build right effect.
So, to make Comet to use DataFusion HashJoin to support Spark build right HashJoin, we only need this function to be public so Comet can directly use it.
There was a problem hiding this comment.
Given this input takes &HashJoinExec as input, it seems like it might be easier to find / make more sense as a method on HashJoinExec itself 🤔
So maybe we could update the code so this function is HashJoinExec::swap_inputs or something
There was a problem hiding this comment.
Sounds good. 👍
Let me update it accordingly.
There was a problem hiding this comment.
Hmm, some functions used in swap_hash_join are also used by other functions like swap_nl_join. So I cannot simply move these functions into HashJoinExec as private ones.
Only I can do is move these functions to joins/utils.rs in physical-plan crate and make them as public APIs so both join_selection.rs and HashJoinExec can use them.
It makes more APIs public actually, including swap_join_filter, swap_reverting_projection and swap_join_type.
I think it is not so worth making more public APIs just for swap_hash_join.
Seems the current approach is better?
WDYT?
| fn swap_hash_join( | ||
| /// This function is public so other downstream projects can use it | ||
| /// to construct `HashJoinExec` with right side as the build side. | ||
| pub fn swap_hash_join( |
There was a problem hiding this comment.
Given this input takes &HashJoinExec as input, it seems like it might be easier to find / make more sense as a method on HashJoinExec itself 🤔
So maybe we could update the code so this function is HashJoinExec::swap_inputs or something
| @@ -157,7 +157,9 @@ fn swap_join_projection( | |||
| } | |||
|
|
|||
| /// This function swaps the inputs of the given join operator. | |||
There was a problem hiding this comment.
| /// This function swaps the inputs of the given join operator. | |
| /// This function swaps the inputs of the given join operator, to construct `HashJoinExec` with right side as the build side |
|
|
||
| /// This function swaps the inputs of the given join operator. | ||
| fn swap_hash_join( | ||
| /// This function is public so other downstream projects can use it |
There was a problem hiding this comment.
| /// This function is public so other downstream projects can use it |
There was a problem hiding this comment.
Without a clear comment, it might be wrongly moved out of public APIs easily.
There was a problem hiding this comment.
I agree but the entire idea of pub modifier implies 3rd party users to use the pub method? 🤔
| /// This function swaps the inputs of the given join operator. | ||
| fn swap_hash_join( | ||
| /// This function is public so other downstream projects can use it | ||
| /// to construct `HashJoinExec` with right side as the build side. |
There was a problem hiding this comment.
| /// to construct `HashJoinExec` with right side as the build side. |
There was a problem hiding this comment.
I think there is a benefit in leaving the documentation that explains the rationale for why this is public
|
Thanks @andygrove @alamb @Dandandan @comphead |
Which issue does this PR close?
Closes #9603.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?