Add interleave kernel (#1523)#2838
Conversation
| use arrow_schema::ArrowError; | ||
|
|
||
| /// | ||
| /// Takes elements by index from a list of [`Array`], creating a new [`Array`] from those values. |
There was a problem hiding this comment.
❤️ for the ascii art (lol though I am biased)
cc @Dandandan and @yjshen
There was a problem hiding this comment.
You created it for the original ticket 😂
There was a problem hiding this comment.
Yes, nothing like a little self praise of my monodraw skillz to lighten up the review process
| /// values array 1 | ||
| /// ``` | ||
| /// | ||
| /// For selecting values by index from a single array see [compute::take](crate::compute::take) |
| /// | ||
| /// For selecting values by index from a single array see [compute::take](crate::compute::take) | ||
| /// | ||
| /// # Panics |
There was a problem hiding this comment.
The function appears to return an error (rather than panic) in these two cases
| "It is not possible to interleave arrays of different data types." | ||
| .to_string(), |
There was a problem hiding this comment.
| "It is not possible to interleave arrays of different data types." | |
| .to_string(), | |
| format!("It is not possible to interleave arrays of different data types ({:?} and {:?})" | |
| array.data_type(), data_type) |
| return Ok(new_empty_array(data_type)); | ||
| } | ||
|
|
||
| // TODO: Add specialized implementations (#1523) |
There was a problem hiding this comment.
It seems like the specialized implementation should be tracked by a different ticket than the initial kernel, right?
| } | ||
|
|
||
| if indices.is_empty() { | ||
| return Ok(new_empty_array(data_type)); |
| fn test_primitive_empty() { | ||
| let a = Int32Array::from_iter_values([1, 2, 3, 4]); | ||
| let v = interleave(&[&a], &[]).unwrap(); | ||
| assert!(v.is_empty()); |
There was a problem hiding this comment.
| assert!(v.is_empty()); | |
| assert!(v.is_empty()); | |
| assert!(v.data_type(), DataType::Int32); |
| let b = Int32Array::from_iter_values([5, 6, 7]); | ||
| let c = Int32Array::from_iter_values([8, 9, 10]); | ||
| let values = | ||
| interleave(&[&a, &b, &c], &[(0, 3), (0, 3), (2, 2), (2, 0), (1, 1)]).unwrap(); |
There was a problem hiding this comment.
👍 for checking repeated indexes (0,3) and (0,3)
| return Err(ArrowError::InvalidArgumentError( | ||
| "interleave requires input of at least one array".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Should we also return an error for single array case and suggest to use compute::take?
There was a problem hiding this comment.
I vacillated a bit on this, I think given the concat kernel which makes even less sense to be called on a single array, doesn't error in this case - I would be inclined to leave it.
There was a problem hiding this comment.
I agree that it is nicer to support handling a single input (mostly as a convenience for downstream users so they don't have to special case their code for len() = 1 case)
|
Parquet failure is unrelated |
|
Benchmark runs are scheduled for baseline = 8adebca and contender = fa1d079. fa1d079 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #1523
Rationale for this change
When implementing sorts, joins, etc... it is fairly common to need a method to select values from multiple source arrays and interleave them into a single output array.
I opted to call this
interleaveinstead oftake_multi, etc... as it has a non-trivially different signature, I think serves a sufficiently different use-case, and it isn't immediately obvious that themultiintake_multirefers to multiple source arrays, and not something different.What changes are included in this PR?
Adds an interleave kernel that allows interleaving values from multiple source arrays. This is an MVP implementation, largely lifted from SortPreservingMerge in DataFusion. Future PRs will look to optimise it
Are there any user-facing changes?
No