Skip to content

DeltaBitPackDecoder Incorrectly Handles Non-Zero MiniBlock Bit Width Padding #1417

@tustvold

Description

@tustvold

Describe the bug

From the parquet specification.

If, in the last block, less than miniblocks are needed to store the values, the bytes storing the bit widths of the unneeded miniblocks are still present, their value should be zero, but readers must accept arbitrary values as well.

This is particularly important at the moment because of #1416 which causes the padding to be arbitrary in the event an encoder is reused.

To Reproduce

This is one of the underlying bugs behind apache/datafusion#1976

Expected behavior

The decoder should ignore the padded miniblock bit widths

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions