Skip to content

Better Hyperloglog cardinality estimation algorithm #2073

@bocharov

Description

@bocharov

Otmar Ertl (@oertl) did a great job on HyperLogLog improvements in his article "New cardinality estimation algorithms for HyperLogLog sketches" https://arxiv.org/abs/1702.01284. Later he proposed to change Redis implementation of HyperLogLog (https://github.com/antirez/redis/pull/4749) and @antirez found that the new implementation is around 20% faster, simpler and has lower theoretical error.

We could improve ClickHouse uniqHLL12, uniqCombined aggregate functions and maybe introduce new function uniqHLL(q, p)(x) using same algorithm.

Metadata

Metadata

Labels

st-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions