Skip to content

Port HyperLogLog implementation to Rust#8095

Merged
LukeMathWalker merged 8 commits intomasterfrom
hll-port
Jan 26, 2026
Merged

Port HyperLogLog implementation to Rust#8095
LukeMathWalker merged 8 commits intomasterfrom
hll-port

Conversation

@LukeMathWalker
Copy link
Collaborator

@LukeMathWalker LukeMathWalker commented Jan 20, 2026

Describe the changes in the pull request

A Rust port of the HyperLogLog implementation from hll.c.

Benchmarks

I've added benches for various hashers—the original C fnv hash, as well as a few others.
Speed-wise, fnv is the third fastest, but accuracy is an issue:

n fnv murmur3 xxhash32 ahash fxhash wyhash
100 101 ( 1.00%) 100 ( 0.00%) 100 ( 0.00%) 101 ( 1.00%) 99 ( 1.00%) 101 ( 1.00%)
1000 1036 ( 3.60%) 1013 ( 1.30%) 991 ( 0.90%) 997 ( 0.30%) 983 ( 1.70%) 1004 ( 0.40%)
10000 8630 (13.70%) 10397 ( 3.97%) 10435 ( 4.35%) 9858 ( 1.42%) 10334 ( 3.34%) 10527 ( 5.27%)
100000 52804 (47.20%) 99283 ( 0.72%) 100909 ( 0.91%) 97607 ( 2.39%) 98307 ( 1.69%) 97695 ( 2.31%)

The table reports the correct cardinality in the n column, the estimated cardinality for each hasher as well as the error rate.
fnv degrades really fast. All things considered (speed + accuracy), it might be a good idea to migrate over to either fxhash or wyhash later on.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Adds a self-contained hyperloglog crate implementing HLL with const generics, stack-allocated registers, cached counts, merge/clear, and pluggable hashers via hash32::Hasher (default CFnvHasher matching the C seed; Murmur3Hasher alias provided).

  • New APIs: HyperLogLog<BITS, SIZE, H>, type aliases (HyperLogLog10, etc.), add, add_precomputed_hash, count, merge, clear, from_registers, try_set_registers
  • Benchmarks compare fnv, murmur3, xxhash32, ahash, fxhash, wyhash; prints accuracy and measures add/count/merge throughput
  • Extensive integration/property tests validate accuracy, merging, caching, register validation, and debug output
  • fnv crate: implements hash32::Hasher for Fnv32 to support 32-bit hashes and exposes C-compatible seeded wrapper (CFnvHasher)
  • Workspace updates: adds hyperloglog member; adds deps (ahash, bytecount, hash32, rustc-hash, wyhash, xxhash-rust) and minor version bumps in Cargo.lock

Written by Cursor Bugbot for commit 2ee0a24. This will update automatically on new commits. Configure here.

@jit-ci
Copy link

jit-ci bot commented Jan 20, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@LukeMathWalker LukeMathWalker marked this pull request as ready for review January 21, 2026 07:30
cursor[bot]

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 96.91358% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.54%. Comparing base (91fd637) to head (2ee0a24).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/redisearch_rs/hyperloglog/src/fnv.rs 75.00% 3 Missing ⚠️
src/redisearch_rs/hyperloglog/src/lib.rs 98.63% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8095      +/-   ##
==========================================
- Coverage   83.07%   82.54%   -0.54%     
==========================================
  Files         361      368       +7     
  Lines       54821    55352     +531     
  Branches    13809    14340     +531     
==========================================
+ Hits        45541    45688     +147     
- Misses       9133     9516     +383     
- Partials      147      148       +1     
Flag Coverage Δ
flow 84.35% <ø> (-0.02%) ⬇️
unit 49.61% <96.91%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cursor[bot]

This comment was marked as outdated.

pub struct HyperLogLog<const BITS: u8, const SIZE: usize, H: hash32::Hasher + Default = CFnvHasher>
{
cached_card: Cell<Option<usize>>,
registers: Box<[u8; SIZE]>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small sizes, why put it in a box? (We use BITS=6 for numeric index today)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rather tricky to have it on the stack up to a certain size and on the heap afterwards (especially with const generics). We could limit ourselves to the stack, which is going to be fine up to 10 bits precision.
Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, if we are going to have 2 implementations for different use-cases, I think it makes sense to make this one in-place only and allow BITS <= 10.
Also, for the numeric use-case, I wonder if we can use even BITS=5 with the new hasher. Currently#define NR_MAXRANGE_CARD 2500

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can play around with it when we are integrating the numeric index and see how it fares!

@LukeMathWalker LukeMathWalker force-pushed the hll-port branch 2 times, most recently from cd513de to cac2830 Compare January 26, 2026 11:35
GuyAv46
GuyAv46 previously approved these changes Jan 26, 2026
@GuyAv46
Copy link
Collaborator

GuyAv46 commented Jan 26, 2026

I guess we can close #7356.

GuyAv46
GuyAv46 previously approved these changes Jan 26, 2026
@LukeMathWalker
Copy link
Collaborator Author

Rebased on latest master to resolve conflicts.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

GuyAv46
GuyAv46 previously approved these changes Jan 26, 2026
@LukeMathWalker LukeMathWalker added this pull request to the merge queue Jan 26, 2026
Merged via the queue into master with commit 31c020d Jan 26, 2026
78 of 92 checks passed
@LukeMathWalker LukeMathWalker deleted the hll-port branch January 26, 2026 18:11
@LukeMathWalker LukeMathWalker restored the hll-port branch February 6, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants