Skip to content

Parquet LevelEncoder is much too eager to write short rle runs #7739

@jhorstmann

Description

@jhorstmann

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While looking into parquet loading performance, I noticed that the definition levels for simple nullable columns often contain very short rle runs. These seem to mess with branch prediction when decoding and decrease performance.

I managed to reproduce this in a benchmark by creating a custom RleEncoder for a bitwidth of 1, which only writes bitpacked runs. With 90% of bits being set, these are the results:

rle_decoder/decode_bitpacked
                        time:   [121.83 µs 121.98 µs 122.13 µs]
                        thrpt:  [8.5857 Gelem/s 8.5966 Gelem/s 8.6068 Gelem/s]
rle_decoder/decode_hybrid
                        time:   [661.33 µs 662.31 µs 663.56 µs]
                        thrpt:  [1.5802 Gelem/s 1.5832 Gelem/s 1.5856 Gelem/s]

Note the decoder in this benchmark is the same for both cases, the only difference is how the levels are serialized.

I also hacked some statistics into the decoder. For the above benchmark these are the distributions of rle vs bitpacked encoded runs. On average, the rle runs are only 16 elements long, and the bitpacked only encoding even seems to be shorter:

[parquet/benches/rle.rs:292:9] bitpacked_data.len() = 135169
[parquet/benches/rle.rs:292:9] &bitpacked_stats = HybridRleStats {
    num_values: 1048576,
    num_rle_values: 0,
    num_rle_runs: 0,
    num_bitpacked_values: 1048576,
    num_bitpacked_runs: 2048,
}
[parquet/benches/rle.rs:296:9] hybrid_data.len() = 157050
[parquet/benches/rle.rs:296:9] &hybrid_stats = HybridRleStats {
    num_values: 1048576,
    num_rle_values: 501915,
    num_rle_runs: 29541,
    num_bitpacked_values: 546661,
    num_bitpacked_runs: 29541,
}

Describe the solution you'd like

I'd like the RleEncoder to be smarter about switching between bitpacked and rle runs. Maybe this is a one-line change in encodings/rle.rs, but I find the logic there a bit hard to follow. Alternatives could be a specialized implementation for single-bit levels, or making this configurable for the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions