Skip to content

Optimize delta binary decoder in the case where bitwidth=0#9477

Draft
etseidl wants to merge 2 commits intoapache:mainfrom
etseidl:delta_binary_bit_zero
Draft

Optimize delta binary decoder in the case where bitwidth=0#9477
etseidl wants to merge 2 commits intoapache:mainfrom
etseidl:delta_binary_bit_zero

Conversation

@etseidl
Copy link
Contributor

@etseidl etseidl commented Feb 25, 2026

Which issue does this PR close?

Rationale for this change

Explore if we can achieve the speedups seen in arrow-cpp (apache/arrow#49296).

What changes are included in this PR?

Adds special cases to the delta binary packed decoder when bitwidth for a miniblock is 0. The optimization avoids relying on previous values to decode current ones.

Are these changes tested?

Yes, tests have been added, as well as new benchmarks.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 25, 2026
@etseidl
Copy link
Contributor Author

etseidl commented Feb 25, 2026

Not seeing the huge improvement from arrow-cpp, but still a nice speedup, and it doesn't seem to be impacting cases where the optimization can't be used.

New benchmarks on my workstation (x86 i7-12700K) comparing main (no_opt) to this branch (opt)

group                                                                                no_opt                                 opt
-----                                                                                ------                                 ---
arrow_array_reader/INT32/Decimal128Array/binary packed increasing value              1.18     45.9±0.62µs        ? ?/sec    1.00     39.0±0.40µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed single value                  1.21     45.7±0.22µs        ? ?/sec    1.00     37.9±0.77µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed increasing value              1.18     48.2±0.50µs        ? ?/sec    1.00     40.7±0.65µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed single value                  1.19     48.1±0.19µs        ? ?/sec    1.00     40.5±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed increasing value                         1.24     38.0±1.08µs        ? ?/sec    1.00     30.7±0.18µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed single value                             1.26     37.1±0.23µs        ? ?/sec    1.00     29.4±0.12µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed increasing value                         1.23     32.6±0.46µs        ? ?/sec    1.00     26.6±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed single value                             1.28     32.5±0.19µs        ? ?/sec    1.00     25.3±0.38µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed increasing value                         1.21     35.2±0.15µs        ? ?/sec    1.00     29.0±0.12µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed single value                             1.21     35.2±0.36µs        ? ?/sec    1.00     29.1±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed increasing value                          1.20     37.5±0.25µs        ? ?/sec    1.00     31.3±0.26µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed single value                              1.27     37.9±0.59µs        ? ?/sec    1.00     30.0±0.18µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed increasing value                        1.22     37.4±0.53µs        ? ?/sec    1.00     30.6±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed single value                            1.25     37.0±0.16µs        ? ?/sec    1.00     29.6±0.19µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed increasing value                        1.24     33.6±0.18µs        ? ?/sec    1.00     27.0±0.14µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed single value                            1.30     33.5±0.18µs        ? ?/sec    1.00     25.9±0.16µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed increasing value                        1.22     35.3±0.13µs        ? ?/sec    1.00     28.9±0.20µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed single value                            1.21     35.3±0.14µs        ? ?/sec    1.00     29.3±0.39µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed increasing value                         1.19     37.4±0.25µs        ? ?/sec    1.00     31.4±0.17µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed single value                             1.23     37.4±0.19µs        ? ?/sec    1.00     30.3±0.45µs        ? ?/sec

And the rest of the binary packed benches

Details
group                                                                                no_opt                                 opt
-----                                                                                ------                                 ---
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs     1.01     50.9±0.29µs        ? ?/sec    1.00     50.3±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs    1.01     59.7±0.78µs        ? ?/sec    1.00     59.4±0.82µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs      1.00     51.8±0.22µs        ? ?/sec    1.00     51.7±0.21µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs          1.01     73.9±0.50µs        ? ?/sec    1.00     73.4±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs         1.02    102.4±2.40µs        ? ?/sec    1.00    100.8±1.01µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs           1.00     75.7±0.35µs        ? ?/sec    1.00     75.8±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs     1.02     53.4±0.57µs        ? ?/sec    1.00     52.4±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs    1.00     61.2±0.26µs        ? ?/sec    1.00     61.5±0.39µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs      1.00     53.7±0.28µs        ? ?/sec    1.00     53.8±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs          1.00     79.1±0.86µs        ? ?/sec    1.00     79.0±0.50µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs         1.05    110.0±2.47µs        ? ?/sec    1.00    105.0±1.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs           1.01     81.4±0.33µs        ? ?/sec    1.00     80.2±0.23µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                1.00     37.8±0.51µs        ? ?/sec    1.00     37.7±0.68µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs               1.00     48.9±0.21µs        ? ?/sec    1.03     50.2±0.53µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                 1.00     38.7±0.39µs        ? ?/sec    1.00     38.8±0.60µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                     1.01     51.4±0.52µs        ? ?/sec    1.00     50.9±0.72µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                    1.00     81.5±0.94µs        ? ?/sec    1.00     81.3±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                      1.00     53.6±0.22µs        ? ?/sec    1.00     53.6±0.40µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                1.01     38.3±0.28µs        ? ?/sec    1.00     37.8±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs               1.00     47.4±0.33µs        ? ?/sec    1.01     47.8±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                 1.00     39.1±0.18µs        ? ?/sec    1.03     40.3±1.11µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                     1.01     47.4±0.18µs        ? ?/sec    1.00     46.9±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                    1.01     78.3±1.17µs        ? ?/sec    1.00     77.7±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                      1.00     52.2±0.35µs        ? ?/sec    1.00     52.3±0.66µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                1.00     40.2±0.15µs        ? ?/sec    1.01     40.6±0.14µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs               1.00     49.7±0.26µs        ? ?/sec    1.00     49.6±0.23µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                 1.00     41.5±0.13µs        ? ?/sec    1.02     42.3±0.16µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                     1.00     54.8±0.25µs        ? ?/sec    1.02     56.0±0.22µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                    1.01     81.5±0.48µs        ? ?/sec    1.00     80.5±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                      1.00     56.9±0.31µs        ? ?/sec    1.01     57.4±0.22µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                 1.01     40.2±0.29µs        ? ?/sec    1.00     40.0±0.26µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                1.01     50.7±0.69µs        ? ?/sec    1.00     50.4±0.50µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                  1.00     41.5±0.19µs        ? ?/sec    1.00     41.4±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                      1.00     54.1±0.19µs        ? ?/sec    1.00     54.0±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                     1.00     84.3±0.32µs        ? ?/sec    1.00     84.1±1.11µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                       1.00     57.2±0.69µs        ? ?/sec    1.00     57.3±0.42µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs               1.01     44.7±0.30µs        ? ?/sec    1.00     44.4±0.51µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs              1.01     53.2±0.38µs        ? ?/sec    1.00     52.8±0.60µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                1.00     45.8±0.20µs        ? ?/sec    1.00     45.8±0.61µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                    1.01     59.9±0.27µs        ? ?/sec    1.00     59.5±0.28µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                   1.00     87.4±1.04µs        ? ?/sec    1.00     87.9±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                     1.01     62.7±0.56µs        ? ?/sec    1.00     62.2±0.51µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs               1.01     38.8±0.67µs        ? ?/sec    1.00     38.5±0.69µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs              1.00     48.2±0.20µs        ? ?/sec    1.00     48.1±0.23µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                1.00     40.1±0.44µs        ? ?/sec    1.00     40.2±0.42µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                    1.01     50.0±0.46µs        ? ?/sec    1.00     49.5±0.49µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                   1.00     77.7±0.34µs        ? ?/sec    1.01     78.1±0.24µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                     1.01     52.6±0.34µs        ? ?/sec    1.00     52.1±0.45µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs               1.00     40.3±0.21µs        ? ?/sec    1.01     40.6±0.15µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs              1.00     49.8±0.67µs        ? ?/sec    1.00     49.7±0.46µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                1.00     41.4±0.15µs        ? ?/sec    1.02     42.1±0.40µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                    1.00     55.5±0.31µs        ? ?/sec    1.01     56.1±0.53µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                   1.02     81.8±0.62µs        ? ?/sec    1.00     80.5±0.41µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                     1.00     57.5±1.01µs        ? ?/sec    1.00     57.3±0.67µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                1.01     42.7±0.52µs        ? ?/sec    1.00     42.4±0.23µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs               1.00     52.3±0.40µs        ? ?/sec    1.00     52.0±0.72µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                 1.00     43.9±0.19µs        ? ?/sec    1.01     44.1±0.20µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                     1.03     57.7±0.89µs        ? ?/sec    1.00     56.1±0.26µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                    1.00     85.9±0.54µs        ? ?/sec    1.00     85.7±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                      1.00     59.9±0.19µs        ? ?/sec    1.01     60.6±0.69µs        ? ?/sec

@etseidl
Copy link
Contributor Author

etseidl commented Feb 25, 2026

Currently unknown what impact this optimization will have on the other delta encodings. There could be a good speedup for situations like constant length strings + DELTA_LENGTH_BYTE_ARRAY (think UUIDs or hashes), as well as long runs of the same prefix or long runs of strings with no shared prefix with DELTA_BYTE_ARRAY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speedup DELTA_BINARY_PACKED decoding when bitwidth is 0

1 participant