-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
An invariant of RowSelection is that it alternates select and skip, and does not contain empty RowSelector.
This is typically enforced when a RowSelection is created from a slice (or vec) of RowSelector by from_selectors_and_combine.
When intersect_row_selections was imported from DataFusion in #3047 and subsequently exposed as a member function in https://github.com/apache/arrow-rs/pull/3084/files#diff-7638a63d118da0ac5321c1948eb9acfc59f7acee56598879eba8338b2c22ff9eR334 a subtle bug was introduced.
intersect_row_selections does not produce a Vec<RowSelector> that obey the invariants of RowSelection, and yet the member function doesn't call from_selectors_and_combine.
This results in RowSelection of the form [Skip(x), Skip(y)]. The async reader determines what data to fetch based on what rows are selected, however, when reading the data it performs each operation in turn. In order to perform the first skip, the reader must set up the decoders to the relevant position within the pages (as it doesn't know that the next operation is another skip). This in turn causes it to request data that wasn't fetched, and the reader bails out with an offset index error.
To Reproduce
#[test]
fn test_intersection() {
let selection = RowSelection::from(vec![RowSelector::select(1048576)]);
let result = selection.intersection(&selection);
assert_eq!(result, selection);
}
Expected behavior
Additional context