-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In DataFusion we are trying to add interval support, including from data that comes from non-SQL sources. This data can have Intervals of type:
DataType::Interval(IntervalUnit::YearMonth)
DataType::Interval(IntervalUnit::DayTime)
DataType::Interval(IntervalUnit::MonthDayNano)
However, there is no easy way currently to convert between such types
Describe the solution you'd like
Add support for casting intervals to cast kernel : https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast.rs#L18-L36
The following casts should be always supported as they are lossless
Interval(YearMonth)->DataType::Interval(MonthDayNano)Interval(DayTime)->DataType::Interval(MonthDayNano)
These casts should not be supported as the data ranges are different
Interval(YearMonth)->Interval(DayTime)Interval(DayTime)->Interval(YearMonth)
These casts should behave like the other timestamp kernels
DataType::Interval(MonthDayNano)->Interval(YearMonth)DataType::Interval(MonthDayNano)->Interval(DayTime)
Specifically they should truncate (silently) but error on overflow (see example below)
Describe alternatives you've considered
Additional context
Example desired Interval casts
fn cast_interval_units() {
// want to be able to cast to/from different interval units
let interval_year_month = IntervalYearMonthArray::from(vec![
// 1 year 5 months
Some(IntervalYearMonthType::make_value(1, 5)),
None,
]);
// thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(YearMonth) to Interval(MonthDayNano) not supported")', src/main.rs:55:112
//let interval_month_day_nanos = cast(&interval_year_month, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
let interval_day_time = IntervalDayTimeArray::from(vec![
// 2 days 7 milliseconds
Some(IntervalDayTimeType::make_value(2, 7)),
None,
]);
// thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(DayTime) to Interval(MonthDayNano) not supported")', src/main.rs:65:110
// let interval_month_day_nanos = cast(&interval_day_time, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
// Somewhat trickier is how to go from MonthDayNano back to lower precision intervals:
//
let interval_month_day_nano = IntervalMonthDayNanoArray::from(vec![
// 1 month 5 days 0 nanoseconds
// (could losslessly cast to Interval(YearMonth) but not Interval(DayTime)
Some(IntervalMonthDayNanoType::make_value(1, 5, 0)),
// 0 months 2 days and 7 milliseconds
// (could losslessly cast to Interval(DayTime) but not Interval(MonthDay)
Some(IntervalMonthDayNanoType::make_value(0, 2, 7 * 1_000_000)),
// 2M months would overflow Interval(DayTime) but not Interval(MonthDay)
Some(IntervalMonthDayNanoType::make_value(2_000_000, 0, 0)),
None,
]);
Example timestamp cast behavior
fn cast_timestamps() {
let arr = TimestampNanosecondArray::from(vec![
1_000,
1_000_000,
1_000_000_000,
1_000_000_000_000,
1_000_000_000_000_000,
1_000_000_000_000_000_000,
2_000_000_000_000_000_000,
]);
let types = vec![
DataType::Date32,
DataType::Date64,
DataType::Timestamp(TimeUnit::Second, None),
DataType::Timestamp(TimeUnit::Millisecond, None),
DataType::Timestamp(TimeUnit::Microsecond, None),
DataType::Timestamp(TimeUnit::Nanosecond, None),
];
for dt in &types {
let out = cast(&arr, dt).unwrap();
let col_title = format!("{:?}", dt);
println!("output:\n{}",
pretty_format_columns(&col_title, &[out]).unwrap()
);
}
// however, error / null on overfllow
let arr = TimestampMicrosecondArray::from(vec![
2_000_000_000_000_000_000,
]);
let out = cast(&arr, &DataType::Timestamp(TimeUnit::Nanosecond, None)).unwrap();
println!("output:\n{}",
pretty_format_columns("ms -> ns", &[out]).unwrap()
);
let out = cast_with_options(
&arr,
&DataType::Timestamp(TimeUnit::Nanosecond, None),
&CastOptions {safe: false},
).unwrap();
}Produces output:
output:
+------------+
| Date32 |
+------------+
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-12 |
| 2001-09-09 |
| 2033-05-18 |
+------------+
output:
+-------------------------+
| Date64 |
+-------------------------+
| 1970-01-01T00:00:00 |
| 1970-01-01T00:00:00.001 |
| 1970-01-01T00:00:01 |
| 1970-01-01T00:16:40 |
| 1970-01-12T13:46:40 |
| 2001-09-09T01:46:40 |
| 2033-05-18T03:33:20 |
+-------------------------+
output:
+-------------------------+
| Timestamp(Second, None) |
+-------------------------+
| 1970-01-01T00:00:00 |
| 1970-01-01T00:00:00 |
| 1970-01-01T00:00:01 |
| 1970-01-01T00:16:40 |
| 1970-01-12T13:46:40 |
| 2001-09-09T01:46:40 |
| 2033-05-18T03:33:20 |
+-------------------------+
output:
+------------------------------+
| Timestamp(Millisecond, None) |
+------------------------------+
| 1970-01-01T00:00:00 |
| 1970-01-01T00:00:00.001 |
| 1970-01-01T00:00:01 |
| 1970-01-01T00:16:40 |
| 1970-01-12T13:46:40 |
| 2001-09-09T01:46:40 |
| 2033-05-18T03:33:20 |
+------------------------------+
output:
+------------------------------+
| Timestamp(Microsecond, None) |
+------------------------------+
| 1970-01-01T00:00:00.000001 |
| 1970-01-01T00:00:00.001 |
| 1970-01-01T00:00:01 |
| 1970-01-01T00:16:40 |
| 1970-01-12T13:46:40 |
| 2001-09-09T01:46:40 |
| 2033-05-18T03:33:20 |
+------------------------------+
output:
+-----------------------------+
| Timestamp(Nanosecond, None) |
+-----------------------------+
| 1970-01-01T00:00:00.000001 |
| 1970-01-01T00:00:00.001 |
| 1970-01-01T00:00:01 |
| 1970-01-01T00:16:40 |
| 1970-01-12T13:46:40 |
| 2001-09-09T01:46:40 |
| 2033-05-18T03:33:20 |
+-----------------------------+
output:
+----------+
| ms -> ns |
+----------+
| |
+----------+
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ComputeError("Overflow happened on: 2000000000000000000 * 1000")', src/main.rs:96:7
stack backtrace:
0: rust_begin_unwind
at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/panicking.rs:64:14
2: core::result::unwrap_failed
at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/result.rs:1790:5
3: core::result::Result<T,E>::unwrap
at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/result.rs:1112:23
4: rust_arrow_playground::cast_timestamps
at ./src/main.rs:92:15
5: rust_arrow_playground::main
at ./src/main.rs:16:5
6: core::ops::function::FnOnce::call_once
at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.