MOD-9769: Implement varint encoding/decoding in Rust by zeenix · Pull Request #6287 · RediSearch/RediSearch

zeenix · 2025-06-09T14:03:45Z

Describe the changes in the pull request

This PR re-implements Varint encoding & decoding in Rust. It also adds benchmarks that compare the
performance and memory usage of C vs Rust implementations.

A recent sample run of the benchmarks on my laptop:

❯ cargo bench

Varint|Encode/Rust      time:   [660.06 ns 662.24 ns 665.10 ns]
                        change: [+0.9330% +2.4229% +3.9848%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
Varint|Encode/C         time:   [665.43 ns 666.27 ns 667.36 ns]
                        change: [+2.6780% +4.0123% +5.3601%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

Varint|Decode/Rust      time:   [135.24 ns 135.37 ns 135.50 ns]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Varint|Decode/C         time:   [319.02 ns 319.54 ns 320.19 ns]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  7 (7.00%) high severe

Varint|Encode FieldMask/Rust
                        time:   [672.07 ns 673.37 ns 675.06 ns]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe
Varint|Encode FieldMask/C
                        time:   [668.89 ns 670.55 ns 672.26 ns]
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

Varint|Decode FieldMask/Rust
                        time:   [239.62 ns 239.89 ns 240.21 ns]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Varint|Decode FieldMask/C
                        time:   [351.79 ns 352.80 ns 353.91 ns]
Found 19 outliers among 100 measurements (19.00%)
  8 (8.00%) low severe
  2 (2.00%) high mild
  9 (9.00%) high severe

Varint Vector Writer|Vector Writer/Rust
                        time:   [582.04 ns 585.19 ns 588.42 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Varint Vector Writer|Vector Writer/C
                        time:   [658.85 ns 660.69 ns 662.67 ns]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

As you can see the performance is similar, except in case of:

Vector writer, which is about 100 ns faster than the C implementation. I suspect this is due to
Rust side using a Vec and C side using the internal Buffer API and the former being more
efficient than latter at allocations/growth.
Decoder, which is unbelievably faster than the C implementation. The reason is that to be able to
call the C decoder functions from Rust, I had to create non-inlined wrappers around the
static inline ones and hence that removes the advantage of inlining from the C side. If you add
#[inline(never)] to the Rust functions, the benchmark results become quite comparable for Varint
and Rust's FieldMask decoding becomes 2x slower than its C counterpart:

Varint|Decode/Rust      time:   [309.14 ns 310.67 ns 312.31 ns]
                        change: [+126.58% +129.49% +132.24%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Varint|Decode/C         time:   [310.92 ns 311.30 ns 311.77 ns]
                        change: [-3.3777% -1.5450% +1.1390%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

Varint|Decode FieldMask/Rust
                        time:   [695.86 ns 696.72 ns 697.73 ns]
                        change: [+186.12% +190.35% +193.93%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  14 (14.00%) high severe
Varint|Decode FieldMask/C
                        time:   [352.47 ns 353.46 ns 354.49 ns]
                        change: [-1.0641% +0.2702% +1.6579%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

Side notes:

This PR was split from Varint oxidization #5996, rebased and then benchmarks were added on top.
Once this PR is merged, the next step would be to drop the use of the C implementation in favor of this.

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

src/redisearch_rs/encode_decode/Cargo.toml

src/redisearch_rs/encode_decode/src/varint/mod.rs

src/redisearch_rs/varint_bencher/src/c_varint.rs

src/varint.c

src/redisearch_rs/varint_bencher/README.md

chesedo

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)? Do we know why it is slower?

Also did the memory usage show they have the same compression?

zeenix · 2025-06-10T09:49:50Z

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)?

We don't disable inlining so I don't think we need to be concerned.

Also did the memory usage show they have the same compression?

Yes. I should have included the output of that here. I'll add that soon.

chesedo · 2025-06-10T12:44:21Z

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)?

We don't disable inlining so I don't think we need to be concerned.

My concern is more of, "what if that means the Rust inlined version is slower than the C version too?". We currently only know the inlined Rust version is faster than the non-inlined C version. And since the non-inlined Rust version is slower than the non-inlined C version, then maybe than means Rust is slower when inlined too.

zeenix · 2025-06-10T13:57:42Z

My concern is more of, "what if that means the Rust inlined version is slower than the C version too?". We currently only know the inlined Rust version is faster than the non-inlined C version. And since the non-inlined Rust version is slower than the non-inlined C version, then maybe than means Rust is slower when inlined too.

I understand but given that:

We would never put #[inline(never)] on the Rust functions and hence always let the compiler+linker inline where it makes sense.
We're talking a few hundred nanoseconds for millions of iterations here in the worst case scenario.

I'm not sure there is anything to worry about here. Besides, the code is identical so I'm not sure what can be done here.

src/redisearch_rs/encode_decode/Cargo.toml

src/redisearch_rs/encode_decode/src/varint/mod.rs

src/redisearch_rs/varint_bencher/src/main.rs

zeenix · 2025-06-11T12:30:40Z

Just (clean) rebased on current master.

src/redisearch_rs/varint_bencher/src/main.rs

src/redisearch_rs/varint_bencher/src/c_varint.rs

dor-forer · 2025-06-11T12:51:58Z

src/redisearch_rs/varint_bencher/src/bencher.rs

+    let mut buf = Vec::with_capacity(1024);
+    group.bench_function("Rust", |b| {
+        b.iter(|| {
+            for &value in values {


Should buf be cleared or reset before each value, like in c benchmark?

The Buffer wrapper is a C-specific thing and not needed for/by Rust so there is nothing to reset.

Although now that you mention it, perhaps the C benchmark doesn't need to reset the buffer. I'll check..

Oh, I didn't realize that this is the writer part. It indeed does need to reset at least the offset but only resetting the offset bit does seem to make a small difference (C encoding becomes 10 nanoseconds faster on my machine).

We should create a fresh buffer for every iteration here (or resetting the writing cursor); otherwise, we end up measuring the growth algorithm of the buffer.

src/redisearch_rs/encode_decode/src/varint/vector_writer.rs

src/redisearch_rs/varint_bencher/.gitignore

src/varint.h

src/varint.c

src/redisearch_rs/encode_decode/src/varint/mod.rs

src/redisearch_rs/encode_decode/src/varint/vector_writer.rs

This is the Rust re-implementation of the Varint* API. We'll use this later to replace the C version.

zeenix · 2025-06-12T10:59:50Z

All review comments addressed and rebased.

src/redisearch_rs/encode_decode/src/varint/vector_writer.rs

This reverts commit 6597daa.

zeenix · 2025-06-12T17:50:31Z

There is a strange linker error with the benchmarking code when building for coverage. I ran out of time before I could fix it (and I was under the impression that coverage job is still broken and hence optional).

However, given that #6308 passes all checks and is based on this PR, I'd just close this and get that one merged. Up to you folks of course.

This reverts commit c7bc844.

codecov · 2025-06-16T13:46:11Z

Codecov Report

Attention: Patch coverage is 78.37838% with 48 lines in your changes missing coverage. Please review.

Project coverage is 88.78%. Comparing base (ffa4012) to head (da9e79e).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
...earch_rs/encode_decode/src/varint/vector_writer.rs	0.00%	35 Missing ⚠️
src/redisearch_rs/encode_decode/src/varint/mod.rs	95.08%	3 Missing and 6 partials ⚠️
src/varint.c	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6287      +/-   ##
==========================================
- Coverage   88.89%   88.78%   -0.12%     
==========================================
  Files         241      243       +2     
  Lines       40794    41063     +269     
  Branches     3435     3653     +218     
==========================================
+ Hits        36264    36456     +192     
- Misses       4494     4565      +71     
- Partials       36       42       +6

Flag	Coverage Δ
flow	`82.17% <0.00%> (-0.19%)`	⬇️
unit	`46.51% <78.37%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions bot added the size:XL label Jun 9, 2025

zeenix requested a review from chesedo June 9, 2025 14:04

zeenix added the action:run-micro-benchmark label Jun 9, 2025

zeenix force-pushed the varint-rs branch from 689b5bf to c5df333 Compare June 9, 2025 17:39