HSE course work: feature/in filter operator#79648
HSE course work: feature/in filter operator#79648afigor2701 wants to merge 7 commits intoClickHouse:masterfrom
Conversation
|
Great! Good work! 👏🏼👏🏼👏🏼 |
|
Could you please write some tests and documents for the new function? |
|
@afigor2701 Please check the comment from hanfei1991 and check the indentation/style of |
2b5445a to
ad818b8
Compare
|
Can you please add documentation for the new functions Testcase also needed - please check examples in |
|
@afigor2701 Just checking, are you currently working on this? I took the PR for a spin and couple of quick issues - 1) There is a crash if the query is processed by multiple threads (No crash if |
Yes, I am going to continue working with this issue. Thank you for comments, I will try to fix this issues |
|
Dear @shankar-iyer, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Dear @shankar-iyer, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Adding new IN_BLOOM_FILTER and IN_CUCKOO_FILTER operators
This pr closes 'Probabilistic data structures for filtering' issue from #71175
Documentation entry for user-facing changes
The idea of this task is to provide a probabilistic alternative for the IN (subquery) operator using bloom filter, counting bloom filter (to check for elements likely appeared multiple times), cuckoo filter, quotient filter, vacuum filter, and to compare all these algorithms.
The applications are cohort analysis and antifraud.