-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Better Hyperloglog cardinality estimation algorithm #2073
Copy link
Copy link
Closed
Labels
st-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.We need extra data to continue (waiting for response). Either some details or a repro of the issue.
Description
Otmar Ertl (@oertl) did a great job on HyperLogLog improvements in his article "New cardinality estimation algorithms for HyperLogLog sketches" https://arxiv.org/abs/1702.01284. Later he proposed to change Redis implementation of HyperLogLog (https://github.com/antirez/redis/pull/4749) and @antirez found that the new implementation is around 20% faster, simpler and has lower theoretical error.
We could improve ClickHouse uniqHLL12, uniqCombined aggregate functions and maybe introduce new function uniqHLL(q, p)(x) using same algorithm.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
st-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.We need extra data to continue (waiting for response). Either some details or a repro of the issue.