bug: Fix edge cases in array_slice#14489
Conversation
| // Note, we actually test the contrapositive, index < -length, because negating a | ||
| // negative will panic if the negative is equal to the smallest representable value | ||
| // while negating a positive is always safe. | ||
| if index < (O::zero() - O::one()) * len { |
There was a problem hiding this comment.
If anyone knows a simpler way of negating len, please let me know.
There was a problem hiding this comment.
I played around with it and I could not find anything better than this
39175da to
4219dc9
Compare
This commit fixes the following edge cases in the array_slice function
so that it's semantics match DuckDB:
- When begin < 0 and -begin > length, begin is clamped to the
beginning of the list.
- When step < 0 and begin = end, then the result should be a list with
the single element found at index begin/end.
Fixes apache#10548
4219dc9 to
fdc8b71
Compare
| @@ -1941,12 +1941,12 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -4 | |||
| query ?? | |||
| select array_slice(make_array(1, 2, 3, 4, 5), -7, -2), array_slice(make_array('h', 'e', 'l', 'l', 'o'), -7, -3); | |||
There was a problem hiding this comment.
Verified this is consisent with DuckDB
D select array_slice([1, 2, 3, 4, 5], -7, -2)
;
┌─────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -7, -2) │
│ int32[] │
├─────────────────────────────────────────────────────┤
│ [1, 2, 3, 4] │
└─────────────────────────────────────────────────────┘D select array_slice(['h', 'e', 'l', 'l', 'o'], -7, -3);
┌───────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value('h', 'e', 'l', 'l', 'o'), -7, -3) │
│ varchar[] │
├───────────────────────────────────────────────────────────────┤
│ [h, e, l] │
└───────────────────────────────────────────────────────────────┘
| [1] [h] | ||
|
|
||
| query ?? | ||
| select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -9223372036854775808, 1), list_slice(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'), -9223372036854775808, 1); |
There was a problem hiding this comment.
D select array_slice([1, 2, 3, 4, 5], -2147483648, 1)
;
┌─────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -2147483648, 1) │
│ int32[] │
├─────────────────────────────────────────────────────────────┤
│ [1] │
└─────────────────────────────────────────────────────────────┘|
|
||
| # array_slice scalar function #25 (with negative step and equal indexes) | ||
| query ?? | ||
| select array_slice(make_array(1, 2, 3, 4, 5), 2, 2, -1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), 2, 2, -1); |
There was a problem hiding this comment.
D select array_slice([1, 2, 3, 4, 5], 2, 2, -1);
┌───────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), 2, 2, -1) │
│ int32[] │
├───────────────────────────────────────────────────────┤
│ [2] │
└───────────────────────────────────────────────────────|
|
||
| # array_slice scalar function #24 (with first negative index larger than len) | ||
| query ?? | ||
| select array_slice(make_array(1, 2, 3, 4, 5), -2147483648, 1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), -2147483648, 1); |
There was a problem hiding this comment.
BTW I know this is consistent with the other tests in this file (so we shouldn't change it in this PR), but DataFusion now supports the [] array literal syntax too:
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ datafusion-cli
DataFusion CLI v45.0.0
> select array_slice([1, 2, 3, 4, 5], -7, -2);
+-------------------------------------------------------------------------------+
| make_array(Int64(1),Int64(2),Int64(3),Int64(4),Int64(5))[Int64(-7):Int64(-2)] |
+-------------------------------------------------------------------------------+
| [] |
+-------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.049 seconds.| // Note, we actually test the contrapositive, index < -length, because negating a | ||
| // negative will panic if the negative is equal to the smallest representable value | ||
| // while negating a positive is always safe. | ||
| if index < (O::zero() - O::one()) * len { |
There was a problem hiding this comment.
I played around with it and I could not find anything better than this
Thank you! |
This commit fixes the following edge cases in the array_slice function so that it's semantics match DuckDB:
When begin < 0 and -begin > length, begin is clamped to the beginning of the list.
When step < 0 and begin = end, then the result should be a list with the single element found at index begin/end.
Fixes
array_slicecan't correctly handle NULL parameters or some edge cases #10548Which issue does this PR close?
array_slicecan't correctly handle NULL parameters or some edge cases #10548.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the behavior of
array_slicehas changed.