Skip to content

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296

Merged
pitrou merged 1 commit intoapache:mainfrom
pitrou:delta-zero-opt
Feb 24, 2026
Merged

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296
pitrou merged 1 commit intoapache:mainfrom
pitrou:delta-zero-opt

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Feb 16, 2026

Rationale for this change

DELTA_BINARY_PACKED decoding has limited performance due to a back-to-back dependency between the computations of value N and value N+1.

However, we can do better if we know that all deltas are 0 in a miniblock. This happens when a miniblock's delta bit width.

What changes are included in this PR?

Avoid reading and accumulating deltas when we the delta bit width is 0. Instead, use a condensed formula that allows to compute a value without waiting for the previous one.

Benchmark results on constant ranges of integers (on my local machine, AMD Zen 2 CPU):

                                                                 benchmark        baseline        contender  change %                                                                                                                                                                                                               counters
                                 BM_DeltaBitPackingDecode_Int32_Fixed/4096   3.821 GiB/sec   12.164 GiB/sec   218.323                              {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70160}
                                BM_DeltaBitPackingDecode_Int32_Fixed/65536   3.897 GiB/sec   12.378 GiB/sec   217.678                              {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4487}
                                BM_DeltaBitPackingDecode_Int32_Fixed/32768   3.909 GiB/sec   12.325 GiB/sec   215.309                              {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9004}
                                 BM_DeltaBitPackingDecode_Int32_Fixed/1024   3.542 GiB/sec   10.468 GiB/sec   195.538                             {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259535}
                                BM_DeltaBitPackingDecode_Int64_Fixed/32768   9.761 GiB/sec   14.040 GiB/sec    43.847                             {'family_index': 12, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11192}
                                BM_DeltaBitPackingDecode_Int64_Fixed/65536   9.814 GiB/sec   14.056 GiB/sec    43.222                              {'family_index': 12, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5617}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/4096   9.672 GiB/sec   13.543 GiB/sec    40.014                              {'family_index': 12, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88891}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/1024   8.850 GiB/sec   12.207 GiB/sec    37.923                             {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 323720}

Are these changes tested?

Yes, by an additional test meant to stress this specific situation.

Are there any user-facing changes?

No.

@pitrou pitrou marked this pull request as ready for review February 16, 2026 10:09
@pitrou pitrou requested a review from wgtmac as a code owner February 16, 2026 10:09
@pitrou pitrou requested review from mapleFU and rok February 16, 2026 10:09
@pitrou
Copy link
Member Author

pitrou commented Feb 16, 2026

FTR @AntoinePrv :)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes DELTA_BINARY_PACKED decoding performance when the delta bit-width is 0 (all deltas are zero). The optimization avoids a back-to-back dependency between consecutive value computations by using a closed-form formula, achieving 200-300% throughput improvements for Int32 and 40% for Int64 according to benchmarks.

Changes:

  • Added fast path in decoder when delta_bit_width_ == 0 to compute values without reading from bit stream
  • Added documentation comment in encoder about constant synchronization with tests
  • Refactored test infrastructure to support testing with arbitrary value sequences
  • Added new test specifically targeting zero delta bit-width scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
cpp/src/parquet/decoder.cc Added fast path optimization for zero delta bit-width that computes values using arithmetic formula instead of reading from stream
cpp/src/parquet/encoder.cc Added comment noting that block size constants should be kept in sync with test constants
cpp/src/parquet/encoding_test.cc Refactored test helpers to support const-correctness and arbitrary value sequences; added comprehensive test for zero delta bit-width edge case

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just back from holiday and sorry for late reply

if (delta_bit_width_ == 0) {
// Fast path that avoids a back-to-back dependency between two consecutive
// computations: we know all deltas decode to zero. We actually don't
// even need to decode them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also not need to advance decoder? It's a little bit confusing when reading but it's right

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the bit width is zero, there are zero bytes to decode, so we don't need to advance it.
(also this is the purpose of the additional test)

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 24, 2026
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Feb 24, 2026
@pitrou
Copy link
Member Author

pitrou commented Feb 24, 2026

Just back from holiday and sorry for late reply

It's ok, and thank you for reviewing :)

@pitrou
Copy link
Member Author

pitrou commented Feb 24, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: be8b49a

Submitted crossbow builds: ursacomputing/crossbow @ actions-b4edb848a9

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou merged commit 8b50bb1 into apache:main Feb 24, 2026
51 of 54 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Feb 24, 2026
@pitrou pitrou deleted the delta-zero-opt branch February 24, 2026 15:05
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 2 benchmarking runs that have been run so far on merge-commit 8b50bb1.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants