Skip to content

RowSelection::intersection Produces Invalid RowSelection #5036

@tustvold

Description

@tustvold

Describe the bug

An invariant of RowSelection is that it alternates select and skip, and does not contain empty RowSelector.

This is typically enforced when a RowSelection is created from a slice (or vec) of RowSelector by from_selectors_and_combine.

When intersect_row_selections was imported from DataFusion in #3047 and subsequently exposed as a member function in https://github.com/apache/arrow-rs/pull/3084/files#diff-7638a63d118da0ac5321c1948eb9acfc59f7acee56598879eba8338b2c22ff9eR334 a subtle bug was introduced.

intersect_row_selections does not produce a Vec<RowSelector> that obey the invariants of RowSelection, and yet the member function doesn't call from_selectors_and_combine.

This results in RowSelection of the form [Skip(x), Skip(y)]. The async reader determines what data to fetch based on what rows are selected, however, when reading the data it performs each operation in turn. In order to perform the first skip, the reader must set up the decoders to the relevant position within the pages (as it doesn't know that the next operation is another skip). This in turn causes it to request data that wasn't fetched, and the reader bails out with an offset index error.

To Reproduce

#[test]
fn test_intersection() {
    let selection = RowSelection::from(vec![RowSelector::select(1048576)]);
    let result = selection.intersection(&selection);
    assert_eq!(result, selection);
}

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

bugparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions