feat: cast from/to interval and duration#4020
Conversation
9082dff to
6d58d1c
Compare
arrow-cast/src/cast.rs
Outdated
| array: &dyn Array, | ||
| cast_options: &CastOptions, | ||
| ) -> Result<ArrayRef, ArrowError> { | ||
| let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap(); |
arrow-cast/src/cast.rs
Outdated
| for i in 0..array.len() { | ||
| if array.is_null(i) { | ||
| builder.append_null(); | ||
| } else { |
There was a problem hiding this comment.
For array with no nulls, I think we can have a faster path?
There was a problem hiding this comment.
PrimitiveArray::from_trusted_len_iter may fit here.
arrow-cast/src/cast.rs
Outdated
| builder.append_null(); | ||
| } else { | ||
| let v = array.value(i) as i128; | ||
| let v = v.mul_checked(scale); |
There was a problem hiding this comment.
For scale 1 case, maybe we can skip this call?
arrow-cast/src/cast.rs
Outdated
| return Err(ArrowError::ComputeError(format!( | ||
| "Cannot cast to {:?}. Overflowing on {:?}", | ||
| D::DATA_TYPE, | ||
| e |
There was a problem hiding this comment.
e is ArrowError, so you will get nested ArrowError.
|
LGTM, thanks for the effort. |
arrow-cast/src/cast.rs
Outdated
| if cast_options.safe { | ||
| let iter = array | ||
| .iter() | ||
| .map(|v| v.and_then(|v| ((v as i128).checked_mul(scale)))); |
There was a problem hiding this comment.
I am not sure this code correctly converts v into an IntervalMonthDayNanoType -- the IntervalMonthDayNanoType is represented as two fields but this is treating it as a single native type)
I think we should construct IntervalMonthDayNano with this method:
So something like
let months = 0;
let days = 0;
let nanos: i64 = v.checked_mul(scale)?;
Ok(IntervalMonthDayNanoType::make_value(months, days, nanos))
| #[test] | ||
| fn test_cast_from_duration_to_interval() { | ||
| // from duration second to interval month day nano | ||
| let array = vec![1234567]; |
There was a problem hiding this comment.
Can you also please test a value if i64::MAX here and ensure an error is returned?
| assert_eq!(casted_array.value(0), 0); | ||
|
|
||
| // from interval month day nano to duration millisecond | ||
| let array = vec![1234567]; |
There was a problem hiding this comment.
You could probably refactor this repetition into a function and avoid so much boiler plate in the tests
arrow-cast/src/cast.rs
Outdated
| &CastOptions { safe: false }, | ||
| ) | ||
| .unwrap(); | ||
| assert_eq!(casted_array.value(0), 9223372036854775807000000); |
There was a problem hiding this comment.
So if I convert this value back into fields:
let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(9223372036854775807000000);
println!("months: {months}, days: {days}, nanos: {nanos}");it prints out:
months: 0, days: 499999, nanos: -1000000
Which doesn't seem right for 9223372036854775807 milliseconds
I think what is happening can be explained here (the nanoseconds fields have overflowed into the days)
/*
https://github.com/apache/arrow/blob/02c8598d264c839a5b5cf3109bfd406f3b8a6ba5/cpp/src/arrow/type.h#L1475
struct MonthDayNanos {
int32_t months;
int32_t days;
int64_t nanoseconds;
}
128 112 96 80 64 48 32 16 0
+-------+-------+-------+-------+-------+-------+-------+-------+
| months | days | nanos |
+-------+-------+-------+-------+-------+-------+-------+-------+
*/
I think what should happen is
- (preferred): the cast should return an error if the nanosecond field would have overflowed
- (secondarily): if the nanosecond field overflows the overflow should be converted into days
I think erroring (at least at first) is preferred because it avoids questions like "how many seconds are there in a day" which is not actually a fixed quantity given leap seconds, etc.
If someone has need of casting such durations to intervals, we can then sort out the mechanics
|
Thanks @Weijun-H -- I think this is getting close. From my perspecitive all that is left is handling nanosecond overflow correctly and this will be good to go |
| if cast_options.safe { | ||
| let iter = array | ||
| .iter() | ||
| .map(|v| v.and_then(|v| v.checked_mul(scale).map(|v| v as i128))); |
There was a problem hiding this comment.
I think we need to check the overflow in the safe case as well -- and and set the result to Null / NONE when overflow happened
| ); | ||
| assert_eq!(casted_array.value(0), 1234567000); | ||
|
|
||
| let array = vec![i64::MAX]; |
There was a problem hiding this comment.
Can you also please add a test here (and the cases below) showing what happens when an overflow happens with DEFAULT_CAST_OPTIONS? I would expect the result to be Null
arrow-cast/src/cast.rs
Outdated
| &DEFAULT_CAST_OPTIONS, | ||
| ) | ||
| .unwrap(); | ||
| assert_eq!(casted_array.value(0), 0); |
There was a problem hiding this comment.
I think this check should be that the casted_array.valid(0) is false, right?
alamb
left a comment
There was a problem hiding this comment.
I think this PR is ready to go. I don't totally understand why https://github.com/apache/arrow-rs/pull/4020/files#r1167579187 is not needed, but the test coverage
let array = vec![i64::MAX];
let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
array.clone(),
&DEFAULT_CAST_OPTIONS,
)
.unwrap();
assert!(!casted_array.is_valid(0));
covers the case I was thinking of so 👍
Thank you @Weijun-H
Because v is |
Which issue does this PR close?
Closes #3998
Rationale for this change
Explain #3998
What changes are included in this PR?
Are there any user-facing changes?