GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296
GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296pitrou merged 1 commit intoapache:mainfrom
Conversation
3e048a6 to
ec8c4a3
Compare
|
FTR @AntoinePrv :) |
There was a problem hiding this comment.
Pull request overview
This PR optimizes DELTA_BINARY_PACKED decoding performance when the delta bit-width is 0 (all deltas are zero). The optimization avoids a back-to-back dependency between consecutive value computations by using a closed-form formula, achieving 200-300% throughput improvements for Int32 and 40% for Int64 according to benchmarks.
Changes:
- Added fast path in decoder when
delta_bit_width_ == 0to compute values without reading from bit stream - Added documentation comment in encoder about constant synchronization with tests
- Refactored test infrastructure to support testing with arbitrary value sequences
- Added new test specifically targeting zero delta bit-width scenarios
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| cpp/src/parquet/decoder.cc | Added fast path optimization for zero delta bit-width that computes values using arithmetic formula instead of reading from stream |
| cpp/src/parquet/encoder.cc | Added comment noting that block size constants should be kept in sync with test constants |
| cpp/src/parquet/encoding_test.cc | Refactored test helpers to support const-correctness and arbitrary value sequences; added comprehensive test for zero delta bit-width edge case |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mapleFU
left a comment
There was a problem hiding this comment.
Just back from holiday and sorry for late reply
| if (delta_bit_width_ == 0) { | ||
| // Fast path that avoids a back-to-back dependency between two consecutive | ||
| // computations: we know all deltas decode to zero. We actually don't | ||
| // even need to decode them. |
There was a problem hiding this comment.
And also not need to advance decoder? It's a little bit confusing when reading but it's right
There was a problem hiding this comment.
Since the bit width is zero, there are zero bytes to decode, so we don't need to advance it.
(also this is the purpose of the additional test)
ec8c4a3 to
be8b49a
Compare
It's ok, and thank you for reviewing :) |
|
@github-actions crossbow submit -g cpp |
|
Revision: be8b49a Submitted crossbow builds: ursacomputing/crossbow @ actions-b4edb848a9 |
|
After merging your PR, Conbench analyzed the 2 benchmarking runs that have been run so far on merge-commit 8b50bb1. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
DELTA_BINARY_PACKED decoding has limited performance due to a back-to-back dependency between the computations of value N and value N+1.
However, we can do better if we know that all deltas are 0 in a miniblock. This happens when a miniblock's delta bit width.
What changes are included in this PR?
Avoid reading and accumulating deltas when we the delta bit width is 0. Instead, use a condensed formula that allows to compute a value without waiting for the previous one.
Benchmark results on constant ranges of integers (on my local machine, AMD Zen 2 CPU):
Are these changes tested?
Yes, by an additional test meant to stress this specific situation.
Are there any user-facing changes?
No.