-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2240][SQL]Spark SQL add LeftSemiBloomFilterBroadcastJoin #1127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi ,All . I want to submit a join operator called LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB) Sometimes the Semijoin's broadcast table can't fit memory.So we can make it as Bloomfilter to reduce the space and then broadcast it do the mapside join . Some code reference HashJoin and BroadcastNestedLoopJoin implementation. The bloomfilter code use Shark's BloomFilter class implementation.
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind formatting as the rest of the code base?
|
Thanks a lot ,I will reformat it |
Reformat the code as intent 4
Reformat the intent and annotation
|
Hi Zongheng, I reformat the code .I don't know if that is ok. And i hope you can give me more suggestions . Thanks a lot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent 4 spaces. Also I'd go with the full more descriptive name instead of BFB since we are only going to have to type it out in like 2 places.
|
Hi all ,I have resolve the conflict . I don't know if this pr has the value to be merged |
|
Hi @YanjieGao, thank for working on this! I think it would be great to support this optimization. However, I think the hardest part here is going to be figuring how to hook this into the planner such that it is chosen when the data requires it, and otherwise we use the standard join algorithms. Since I think that is going to a pretty large task, perhaps it would be best to close this issue for now and revisit it when we have a full design for how choose join operators. |
|
Hi marmbrus , Got it , if i have some other good idea i will try to communicate with you ,Thanks ,I will close it latter. |
apache#1127) Co-authored-by: Rostyslav Sotnychenko <[email protected]>
Hi ,All .
JIRA:https://issues.apache.org/jira/browse/SPARK-2240
I want to submit a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So we can make it as Bloomfilter to reduce the space and then broadcast it do the mapside join .
Some code reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter code use Shark's BloomFilter class implementation.