Skip to content

Conversation

@YanjieGao
Copy link
Contributor

Hi ,All .
JIRA:https://issues.apache.org/jira/browse/SPARK-2240
I want to submit a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So we can make it as Bloomfilter to reduce the space and then broadcast it do the mapside join .
Some code reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter code use Shark's BloomFilter class implementation.

Hi ,All .
I want to submit  a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So  we can make it as Bloomfilter to  reduce the space  and then broadcast it do the mapside  join .
Some code  reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter  code   use Shark's BloomFilter class implementation.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind formatting as the rest of the code base?

@YanjieGao
Copy link
Contributor Author

Thanks a lot ,I will reformat it

Reformat the code as intent 4
Reformat the intent and annotation
@YanjieGao
Copy link
Contributor Author

Hi Zongheng, I reformat the code .I don't know if that is ok. And i hope you can give me more suggestions . Thanks a lot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent 4 spaces. Also I'd go with the full more descriptive name instead of BFB since we are only going to have to type it out in like 2 places.

@YanjieGao YanjieGao changed the title Spark SQL add LeftSemiBloomFilterBroadcastJoin [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFilterBroadcastJoin Jun 22, 2014
@YanjieGao
Copy link
Contributor Author

Hi all ,I have resolve the conflict . I don't know if this pr has the value to be merged

@marmbrus
Copy link
Contributor

marmbrus commented Sep 3, 2014

Hi @YanjieGao, thank for working on this! I think it would be great to support this optimization. However, I think the hardest part here is going to be figuring how to hook this into the planner such that it is chosen when the data requires it, and otherwise we use the standard join algorithms. Since I think that is going to a pretty large task, perhaps it would be best to close this issue for now and revisit it when we have a full design for how choose join operators.

@YanjieGao
Copy link
Contributor Author

Hi marmbrus , Got it , if i have some other good idea i will try to communicate with you ,Thanks ,I will close it latter.

@YanjieGao YanjieGao closed this Sep 3, 2014
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants