Conversation
chesedo
left a comment
There was a problem hiding this comment.
Are we okay with the field mask decoding being slower in Rust (when inlining is disabled)? Do we know why it is slower?
Also did the memory usage show they have the same compression?
We don't disable inlining so I don't think we need to be concerned.
Yes. I should have included the output of that here. I'll add that soon. |
My concern is more of, "what if that means the Rust inlined version is slower than the C version too?". We currently only know the inlined Rust version is faster than the non-inlined C version. And since the non-inlined Rust version is slower than the non-inlined C version, then maybe than means Rust is slower when inlined too. |
I understand but given that:
I'm not sure there is anything to worry about here. Besides, the code is identical so I'm not sure what can be done here. |
|
Just (clean) rebased on current master. |
| let mut buf = Vec::with_capacity(1024); | ||
| group.bench_function("Rust", |b| { | ||
| b.iter(|| { | ||
| for &value in values { |
There was a problem hiding this comment.
Should buf be cleared or reset before each value, like in c benchmark?
There was a problem hiding this comment.
The Buffer wrapper is a C-specific thing and not needed for/by Rust so there is nothing to reset.
There was a problem hiding this comment.
Although now that you mention it, perhaps the C benchmark doesn't need to reset the buffer. I'll check..
There was a problem hiding this comment.
Oh, I didn't realize that this is the writer part. It indeed does need to reset at least the offset but only resetting the offset bit does seem to make a small difference (C encoding becomes 10 nanoseconds faster on my machine).
There was a problem hiding this comment.
We should create a fresh buffer for every iteration here (or resetting the writing cursor); otherwise, we end up measuring the growth algorithm of the buffer.
This is the Rust re-implementation of the Varint* API. We'll use this later to replace the C version.
|
All review comments addressed and rebased. |
This reverts commit 6597daa.
|
There is a strange linker error with the benchmarking code when building for coverage. I ran out of time before I could fix it (and I was under the impression that coverage job is still broken and hence optional). However, given that #6308 passes all checks and is based on this PR, I'd just close this and get that one merged. Up to you folks of course. |
This reverts commit c7bc844.
71ac8e0 to
38876ff
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6287 +/- ##
==========================================
- Coverage 88.89% 88.78% -0.12%
==========================================
Files 241 243 +2
Lines 40794 41063 +269
Branches 3435 3653 +218
==========================================
+ Hits 36264 36456 +192
- Misses 4494 4565 +71
- Partials 36 42 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Describe the changes in the pull request
This PR re-implements Varint encoding & decoding in Rust. It also adds benchmarks that compare the
performance and memory usage of C vs Rust implementations.
A recent sample run of the benchmarks on my laptop:
As you can see the performance is similar, except in case of:
Vector writer, which is about 100 ns faster than the C implementation. I suspect this is due to
Rust side using a
Vecand C side using the internal Buffer API and the former being moreefficient than latter at allocations/growth.
Decoder, which is unbelievably faster than the C implementation. The reason is that to be able to
call the C decoder functions from Rust, I had to create non-inlined wrappers around the
static inlineones and hence that removes the advantage of inlining from the C side. If you add#[inline(never)]to the Rust functions, the benchmark results become quite comparable for Varintand Rust's FieldMask decoding becomes 2x slower than its C counterpart:
Side notes:
This PR was split from Varint oxidization #5996, rebased and then benchmarks were added on top.
Once this PR is merged, the next step would be to drop the use of the C implementation in favor of this.
Mark if applicable