Optimize `BITCOUNT` with AVX2 and AVX512 popcount implementations. by fcostaoliveira · Pull Request #14309 · redis/redis

fcostaoliveira · 2025-08-27T00:24:31Z

This PR introduces vectorized implementations of BITCOUNT for x86_64 targets with AVX2 and AVX512 support.

AVX2 path: processes 32B at a time, using unrolled POPCNT on 64-bit lanes with independent accumulators to reduce data dependencies.
AVX512 path: leverages VPOPCNTDQ on 64B chunks with _mm512_reduce_add_epi64 to efficiently aggregate results across 512-bit vectors.
Both paths include cache prefetching hints to better overlap memory fetches with computation. This was proved to matter in Add 2K software prefetch to improve BITCOUNT performance #14103.
Fallbacks to the scalar implementation if hardware support is unavailable.

The test suite has been expanded with unit tests that validate correctness across aligned/unaligned buffers, edge cases, random data, and large workloads, ensuring consistency between scalar, AVX2, and AVX512 implementations.

Performance Results on Intel Xeon SPR (single shard)

Test Case	Baseline `/redis unstable` (median obs. ± std.dev)	Comparison `filipecosta90/redis optimize.bitcount.avx` (median obs. ± std.dev)	% change (lower-better)	Note
memtier_benchmark-1key-100M-bits-bitmap-bitcount	6.239 ± 0.5% (7 datapoints)	5.759 ± 0.3% (3 datapoints)	−7.7%	Better
memtier_benchmark-1key-1Billion-bits-bitmap-bitcount	59.647 ± 0.7% (7 datapoints)	53.503 ± 0.3% (3 datapoints)	−10.3%	Better

Performance Results on AMD EPYC 9R14 (single shard)

Test Case	Metric	Baseline `/redis unstable` (median obs. ± std.dev)	Comparison `filipecosta90/redis optimize.bitcount.avx` (median obs. ± std.dev)	% change (lower-better)
memtier_benchmark-1key-100M-bits-bitmap-bitcount	Ops/sec (↑ better)	37,120.79	52,093.55	+40.2%
	Latency (p50 / p99)	5.279 ms / 9.535 ms	3.807 ms / 6.399 ms	−27.9% / −33.0%

Reproduce Benchmarks

pip3 install redis-benchmarks-specification
redis-benchmarks-spec-client-runner \
  --test memtier_benchmark-1key-1Billion-bits-bitmap-bitcount.yml \
  --db_server_host <...> --db_server_port <...> --db_server_password <...> \
  --flushall_on_every_test_start

snyk-io · 2025-08-27T00:24:43Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

kaplanben · 2025-08-27T00:54:25Z

Checkmarx One – Scan Summary & Details – cae2c0a3-4572-43a2-919f-2bfd50f53efe

Great job! No new security vulnerabilities introduced in this pull request

shahsb

The benchmarks focus on very large bitmaps (100M and 1B bits), where this optimization will shine. Is there any data on performance for very small strings (e.g., 16, 32, or 64 bytes)? While unlikely to be slower, it would be useful to confirm that the overhead of the function dispatch and setup for the vectorized paths doesn't negatively impact performance on "tiny" workloads.

shahsb

Really impressive work here! The 10% improvement on large bitmaps is substantial. I'm curious what was the most challenging part of getting this optimization right?

src/bitops.c

….avx

…mize.bitcount.avx

….avx

src/bitops.c

src/config.h

Co-authored-by: debing.sun <[email protected]>

….avx

src/bitops.c

Co-authored-by: debing.sun <[email protected]>

src/bitops.c

src/config.h

Co-authored-by: debing.sun <[email protected]>

sundb

LGTM

sundb · 2025-10-13T12:56:34Z

Optimize BITCOUNT on Intel with AVX2 and AVX512 popcount implementations

Doesn't this PR have no improvement for AMD?

fcostaoliveira · 2025-10-13T13:03:16Z

Doesn't this PR have no improvement for AMD?

It should. we only have runners for intel/arm on CI mainly due to cost. But i'll add data for AMD on a manual run.

fcostaoliveira · 2025-10-13T21:16:28Z

Optimize BITCOUNT on Intel with AVX2 and AVX512 popcount implementations

Doesn't this PR have no improvement for AMD?

@sundb added results for AMD EPYC 9R14 (single shard) -- updated the main comment as well

Test Case	Metric	Baseline `/redis unstable` (median obs. ± std.dev)	Comparison `filipecosta90/redis optimize.bitcount.avx` (median obs. ± std.dev)	% change (lower-better)
memtier_benchmark-1key-100M-bits-bitmap-bitcount	Ops/sec (↑ better)	37,120.79	52,093.55	+40.2%
	Latency (p50 / p99)	5.279 ms / 9.535 ms	3.807 ms / 6.399 ms	−27.9% / −33.0%

fcostaoliveira added 4 commits August 13, 2025 15:22

AVX2 and AVX512 implementations of bitcount

00327c1

NEON bitCount

b67157e

Fix NEON pragma

15b41d5

Enhance BITCOUNT with AVX2/AVX512 optimized popcount implementations.

8c57006

fcostaoliveira requested a review from sundb August 27, 2025 00:24

shahsb reviewed Oct 2, 2025

View reviewed changes

shahsb suggested changes Oct 2, 2025

View reviewed changes

sundb reviewed Oct 9, 2025

View reviewed changes

src/bitops.c Show resolved Hide resolved

src/bitops.c Outdated Show resolved Hide resolved

sundb reviewed Oct 9, 2025

View reviewed changes

src/bitops.c Outdated Show resolved Hide resolved

fcostaoliveira added 6 commits October 11, 2025 23:18

Merge remote-tracking branch 'origin/unstable' into optimize.bitcount…

b8245fa

….avx

fixes per PR review: share bitinbyte across all popcount variants

b698e2c

Added redisPopCountAuto

203b192

Removed spurious code changes

a642885

Merge remote-tracking branch 'filipe/optimize.bitcount.avx' into opti…

0a3db3f

…mize.bitcount.avx

Merge remote-tracking branch 'origin/unstable' into optimize.bitcount…

efebf1f

….avx

fcostaoliveira requested a review from sundb October 11, 2025 22:48

sundb reviewed Oct 12, 2025

View reviewed changes

src/bitops.c Outdated Show resolved Hide resolved

src/bitops.c Outdated Show resolved Hide resolved

src/config.h Outdated Show resolved Hide resolved

fcostaoliveira and others added 3 commits October 12, 2025 13:25

Apply suggestions from code review

719b73a

Co-authored-by: debing.sun <[email protected]>

Merge remote-tracking branch 'origin/unstable' into optimize.bitcount…

75c7bcc

….avx

splitted popcount target from avx2/avx512 targets

5a28896

fcostaoliveira requested a review from sundb October 12, 2025 12:30

sundb approved these changes Oct 12, 2025

View reviewed changes

sundb reviewed Oct 13, 2025

View reviewed changes

src/bitops.c Show resolved Hide resolved

sundb added this to Redis 8.4 Oct 13, 2025

github-project-automation bot moved this to Todo in Redis 8.4 Oct 13, 2025

Update src/bitops.c

cbfab40

Co-authored-by: debing.sun <[email protected]>

sundb reviewed Oct 13, 2025

View reviewed changes

src/bitops.c Show resolved Hide resolved

src/config.h Show resolved Hide resolved

Update src/bitops.c

4b06215

Co-authored-by: debing.sun <[email protected]>

Consolidation of conditional blocks for avx2/avx512/popcount check

0481b17

sundb reviewed Oct 13, 2025

View reviewed changes

sundb changed the title ~~Optimize BITCOUNT on Intel with AVX2 and AVX512 popcount implementations.~~ Optimize BITCOUNT with AVX2 and AVX512 popcount implementations. Oct 14, 2025

sundb merged commit 119c83d into redis:unstable Oct 14, 2025
19 checks passed

github-project-automation bot moved this from Todo to Done in Redis 8.4 Oct 14, 2025

fcostaoliveira mentioned this pull request Oct 14, 2025

Fix nested function on bitopsTest #14430

Merged

sundb mentioned this pull request Feb 10, 2026

Optimize while loop to use 4 elements instead of 7 for calculation, improving performance on older CPUs #13719

Closed

Conversation

fcostaoliveira commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results on Intel Xeon SPR (single shard)

Performance Results on AMD EPYC 9R14 (single shard)

Reproduce Benchmarks

Uh oh!

snyk-io bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

kaplanben commented Aug 27, 2025

Great job! No new security vulnerabilities introduced in this pull request

Uh oh!

shahsb left a comment

Choose a reason for hiding this comment

Uh oh!

shahsb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundb left a comment

Choose a reason for hiding this comment

Uh oh!

sundb commented Oct 13, 2025

Uh oh!

fcostaoliveira commented Oct 13, 2025

Uh oh!

fcostaoliveira commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fcostaoliveira commented Aug 27, 2025 •

edited

Loading

snyk-io bot commented Aug 27, 2025 •

edited

Loading