Skip to content

MOD-9769: Implement varint encoding/decoding in Rust#6287

Merged
zeenix merged 30 commits intomasterfrom
varint-rs
Jun 16, 2025
Merged

MOD-9769: Implement varint encoding/decoding in Rust#6287
zeenix merged 30 commits intomasterfrom
varint-rs

Conversation

@zeenix
Copy link
Contributor

@zeenix zeenix commented Jun 9, 2025

Describe the changes in the pull request

This PR re-implements Varint encoding & decoding in Rust. It also adds benchmarks that compare the
performance and memory usage of C vs Rust implementations.

A recent sample run of the benchmarks on my laptop:

❯ cargo bench

Varint|Encode/Rust      time:   [660.06 ns 662.24 ns 665.10 ns]
                        change: [+0.9330% +2.4229% +3.9848%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
Varint|Encode/C         time:   [665.43 ns 666.27 ns 667.36 ns]
                        change: [+2.6780% +4.0123% +5.3601%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

Varint|Decode/Rust      time:   [135.24 ns 135.37 ns 135.50 ns]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Varint|Decode/C         time:   [319.02 ns 319.54 ns 320.19 ns]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  7 (7.00%) high severe

Varint|Encode FieldMask/Rust
                        time:   [672.07 ns 673.37 ns 675.06 ns]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe
Varint|Encode FieldMask/C
                        time:   [668.89 ns 670.55 ns 672.26 ns]
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

Varint|Decode FieldMask/Rust
                        time:   [239.62 ns 239.89 ns 240.21 ns]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Varint|Decode FieldMask/C
                        time:   [351.79 ns 352.80 ns 353.91 ns]
Found 19 outliers among 100 measurements (19.00%)
  8 (8.00%) low severe
  2 (2.00%) high mild
  9 (9.00%) high severe

Varint Vector Writer|Vector Writer/Rust
                        time:   [582.04 ns 585.19 ns 588.42 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Varint Vector Writer|Vector Writer/C
                        time:   [658.85 ns 660.69 ns 662.67 ns]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

As you can see the performance is similar, except in case of:

  • Vector writer, which is about 100 ns faster than the C implementation. I suspect this is due to
    Rust side using a Vec and C side using the internal Buffer API and the former being more
    efficient than latter at allocations/growth.

  • Decoder, which is unbelievably faster than the C implementation. The reason is that to be able to
    call the C decoder functions from Rust, I had to create non-inlined wrappers around the
    static inline ones and hence that removes the advantage of inlining from the C side. If you add
    #[inline(never)] to the Rust functions, the benchmark results become quite comparable for Varint
    and Rust's FieldMask decoding becomes 2x slower than its C counterpart:

Varint|Decode/Rust      time:   [309.14 ns 310.67 ns 312.31 ns]
                        change: [+126.58% +129.49% +132.24%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Varint|Decode/C         time:   [310.92 ns 311.30 ns 311.77 ns]
                        change: [-3.3777% -1.5450% +1.1390%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

Varint|Decode FieldMask/Rust
                        time:   [695.86 ns 696.72 ns 697.73 ns]
                        change: [+186.12% +190.35% +193.93%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  14 (14.00%) high severe
Varint|Decode FieldMask/C
                        time:   [352.47 ns 353.46 ns 354.49 ns]
                        change: [-1.0641% +0.2702% +1.6579%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

Side notes:

  • This PR was split from Varint oxidization #5996, rebased and then benchmarks were added on top.

  • Once this PR is merged, the next step would be to drop the use of the C implementation in favor of this.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link
Collaborator

@chesedo chesedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)? Do we know why it is slower?

Also did the memory usage show they have the same compression?

@zeenix
Copy link
Contributor Author

zeenix commented Jun 10, 2025

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)?

We don't disable inlining so I don't think we need to be concerned.

Also did the memory usage show they have the same compression?

Yes. I should have included the output of that here. I'll add that soon.

@chesedo
Copy link
Collaborator

chesedo commented Jun 10, 2025

Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)?

We don't disable inlining so I don't think we need to be concerned.

My concern is more of, "what if that means the Rust inlined version is slower than the C version too?". We currently only know the inlined Rust version is faster than the non-inlined C version. And since the non-inlined Rust version is slower than the non-inlined C version, then maybe than means Rust is slower when inlined too.

@zeenix
Copy link
Contributor Author

zeenix commented Jun 10, 2025

My concern is more of, "what if that means the Rust inlined version is slower than the C version too?". We currently only know the inlined Rust version is faster than the non-inlined C version. And since the non-inlined Rust version is slower than the non-inlined C version, then maybe than means Rust is slower when inlined too.

I understand but given that:

  • We would never put #[inline(never)] on the Rust functions and hence always let the compiler+linker inline where it makes sense.
  • We're talking a few hundred nanoseconds for millions of iterations here in the worst case scenario.

I'm not sure there is anything to worry about here. Besides, the code is identical so I'm not sure what can be done here.

@zeenix zeenix requested review from GuyAv46, chesedo and dor-forer June 11, 2025 09:33
chesedo
chesedo previously approved these changes Jun 11, 2025
@zeenix
Copy link
Contributor Author

zeenix commented Jun 11, 2025

Just (clean) rebased on current master.

let mut buf = Vec::with_capacity(1024);
group.bench_function("Rust", |b| {
b.iter(|| {
for &value in values {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should buf be cleared or reset before each value, like in c benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Buffer wrapper is a C-specific thing and not needed for/by Rust so there is nothing to reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now that you mention it, perhaps the C benchmark doesn't need to reset the buffer. I'll check..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize that this is the writer part. It indeed does need to reset at least the offset but only resetting the offset bit does seem to make a small difference (C encoding becomes 10 nanoseconds faster on my machine).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create a fresh buffer for every iteration here (or resetting the writing cursor); otherwise, we end up measuring the growth algorithm of the buffer.

zeenix added 3 commits June 12, 2025 12:54
This is the Rust re-implementation of the Varint* API. We'll use this
later to replace the C version.
@zeenix
Copy link
Contributor Author

zeenix commented Jun 12, 2025

All review comments addressed and rebased.

@zeenix zeenix requested review from GuyAv46, chesedo and dor-forer June 12, 2025 10:59
dor-forer
dor-forer previously approved these changes Jun 12, 2025
GuyAv46
GuyAv46 previously approved these changes Jun 12, 2025
chesedo
chesedo previously approved these changes Jun 12, 2025
@zeenix zeenix dismissed stale reviews from chesedo, GuyAv46, and dor-forer via c7bc844 June 12, 2025 15:18
@zeenix
Copy link
Contributor Author

zeenix commented Jun 12, 2025

There is a strange linker error with the benchmarking code when building for coverage. I ran out of time before I could fix it (and I was under the impression that coverage job is still broken and hence optional).

However, given that #6308 passes all checks and is based on this PR, I'd just close this and get that one merged. Up to you folks of course.

@codecov
Copy link

codecov bot commented Jun 16, 2025

Codecov Report

Attention: Patch coverage is 78.37838% with 48 lines in your changes missing coverage. Please review.

Project coverage is 88.78%. Comparing base (ffa4012) to head (da9e79e).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...earch_rs/encode_decode/src/varint/vector_writer.rs 0.00% 35 Missing ⚠️
src/redisearch_rs/encode_decode/src/varint/mod.rs 95.08% 3 Missing and 6 partials ⚠️
src/varint.c 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6287      +/-   ##
==========================================
- Coverage   88.89%   88.78%   -0.12%     
==========================================
  Files         241      243       +2     
  Lines       40794    41063     +269     
  Branches     3435     3653     +218     
==========================================
+ Hits        36264    36456     +192     
- Misses       4494     4565      +71     
- Partials       36       42       +6     
Flag Coverage Δ
flow 82.17% <0.00%> (-0.19%) ⬇️
unit 46.51% <78.37%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zeenix zeenix added this pull request to the merge queue Jun 16, 2025
Merged via the queue into master with commit 4940349 Jun 16, 2025
21 of 22 checks passed
@zeenix zeenix deleted the varint-rs branch June 16, 2025 17:04
@LukeMathWalker LukeMathWalker restored the varint-rs branch February 6, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants