Support FixedSizeList RowConverter#7705
Conversation
5e19043 to
4551310
Compare
There was a problem hiding this comment.
It should he relatively straightforward to avoid needing to cast and to encode/decode the array directly. This would avoid adding two very heavy dependencies in the form of arrow-cast and by extension arrow-select.
Further, it is actually wasteful to encode fixed size lists in this way, we don't need to var-encode them, we can simply encode the values directly one after each other, much like we do for StructArray.
FixedSizeList RowConverter
|
@tustvold thanks for the feedback! |
The test data doesn't contain 42 value.
Validate that null values are correctly masked out.
Add `DataType::FixedSizeList` support to `RowConverter`. This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion.
4551310 to
667cbdf
Compare
|
Updated to remove casts. |
alamb
left a comment
There was a problem hiding this comment.
Thank you @findepi -- I won't say I fully follow the logic but I did verify the tests and the validation performed and that this seems to follow the pattern of the other converters.
It might also be worth adding a benchmark in arrow/benches/row_format.rs in case someone wants to try to optimize this code more in the future
| } | ||
|
|
||
| pub fn compute_lengths_fixed_size_list( | ||
| tracker: &mut LengthTracker, |
There was a problem hiding this comment.
Just pattern matching, I wonder why this doesn't use the same pattern as List?
Why not
| tracker: &mut LengthTracker, | |
| lengths: &mut [usize], |
(I don't see anything wrong with this I am just curious)
There was a problem hiding this comment.
the caller used to operate on lengths: &mut [usize] and that's why list helper functions have this in their API
the caller has been migrated to tracker: &mut LengthTracker, the helper functions for list hasn't been updated.
if this code was inline in the caller (as for eg structs), it would operate on LengthTracker directly.
I didn't find an example with a normal list to follow. I'd suggest doing this as a follow-up. BTW with the row format so precisely documented, is the format itself set in stone, or subject to change? |
Makes sense
I don't know of any policy / discussion on this topic (so the answer is "I don't know"). Part of the rationale to document the Row Format was that it was a pretty tricky thing to make correct -- I don't remember any rationale about not changing it. I also don't know of any use as a long term interchange format |
|
Thanks again @findepi |
# Which issue does this PR close? none # Rationale for this change This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. # What changes are included in this PR? Add `DataType::FixedSizeList` support to `RowConverter`. # Are there any user-facing changes? No (cherry picked from commit d7fc416)
# Which issue does this PR close? none # Rationale for this change This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. # What changes are included in this PR? Add `DataType::FixedSizeList` support to `RowConverter`. # Are there any user-facing changes? No (cherry picked from commit d7fc416)
none This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. Add `DataType::FixedSizeList` support to `RowConverter`. No (cherry picked from commit d7fc416)
none This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. Add `DataType::FixedSizeList` support to `RowConverter`. No (cherry picked from commit d7fc416)
Which issue does this PR close?
none
Rationale for this change
This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion.
What changes are included in this PR?
Add
DataType::FixedSizeListsupport toRowConverter.Are there any user-facing changes?
No