Skip to content

Support casting to/from different interval units (eg YearDay --> MonthDayNano, etc) #3959

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In DataFusion we are trying to add interval support, including from data that comes from non-SQL sources. This data can have Intervals of type:

DataType::Interval(IntervalUnit::YearMonth)
DataType::Interval(IntervalUnit::DayTime)
DataType::Interval(IntervalUnit::MonthDayNano)

However, there is no easy way currently to convert between such types

Describe the solution you'd like
Add support for casting intervals to cast kernel : https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast.rs#L18-L36

The following casts should be always supported as they are lossless

  • Interval(YearMonth) -> DataType::Interval(MonthDayNano)
  • Interval(DayTime) -> DataType::Interval(MonthDayNano)

These casts should not be supported as the data ranges are different

  • Interval(YearMonth) -> Interval(DayTime)
  • Interval(DayTime) -> Interval(YearMonth)

These casts should behave like the other timestamp kernels

  • DataType::Interval(MonthDayNano) -> Interval(YearMonth)
  • DataType::Interval(MonthDayNano) -> Interval(DayTime)

Specifically they should truncate (silently) but error on overflow (see example below)

Describe alternatives you've considered

Additional context

Example desired Interval casts

fn cast_interval_units() {
    // want to be able to cast to/from different interval units
    let interval_year_month = IntervalYearMonthArray::from(vec![
        // 1 year 5 months
        Some(IntervalYearMonthType::make_value(1, 5)),
        None,
    ]);

    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(YearMonth) to Interval(MonthDayNano) not supported")', src/main.rs:55:112
    //let interval_month_day_nanos = cast(&interval_year_month, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();

    let interval_day_time = IntervalDayTimeArray::from(vec![
        // 2 days 7 milliseconds
        Some(IntervalDayTimeType::make_value(2, 7)),
        None,
    ]);

    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(DayTime) to Interval(MonthDayNano) not supported")', src/main.rs:65:110
    // let interval_month_day_nanos = cast(&interval_day_time, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();


    // Somewhat trickier is how to go from MonthDayNano back to lower precision intervals:
    //
    let interval_month_day_nano = IntervalMonthDayNanoArray::from(vec![
        // 1 month 5 days 0 nanoseconds
        // (could losslessly cast to Interval(YearMonth) but not Interval(DayTime)
        Some(IntervalMonthDayNanoType::make_value(1, 5, 0)),

        // 0 months 2 days and 7 milliseconds
        // (could losslessly cast to Interval(DayTime) but not Interval(MonthDay)
        Some(IntervalMonthDayNanoType::make_value(0, 2, 7 * 1_000_000)),

        // 2M months would overflow Interval(DayTime) but not Interval(MonthDay)
        Some(IntervalMonthDayNanoType::make_value(2_000_000, 0, 0)),

        None,
    ]);

Example timestamp cast behavior

fn cast_timestamps() {
    let arr = TimestampNanosecondArray::from(vec![
        1_000,
        1_000_000,
        1_000_000_000,
        1_000_000_000_000,
        1_000_000_000_000_000,
        1_000_000_000_000_000_000,
        2_000_000_000_000_000_000,
    ]);

    let types = vec![
        DataType::Date32,
        DataType::Date64,
        DataType::Timestamp(TimeUnit::Second, None),
        DataType::Timestamp(TimeUnit::Millisecond, None),
        DataType::Timestamp(TimeUnit::Microsecond, None),
        DataType::Timestamp(TimeUnit::Nanosecond, None),
    ];

    for dt in &types {
        let out = cast(&arr, dt).unwrap();

        let col_title = format!("{:?}", dt);
        println!("output:\n{}",
                 pretty_format_columns(&col_title, &[out]).unwrap()
        );
    }

    // however, error / null on overfllow
    let arr = TimestampMicrosecondArray::from(vec![
        2_000_000_000_000_000_000,
    ]);
    let out = cast(&arr, &DataType::Timestamp(TimeUnit::Nanosecond, None)).unwrap();

    println!("output:\n{}",
             pretty_format_columns("ms -> ns", &[out]).unwrap()
    );

    let out = cast_with_options(
        &arr,
        &DataType::Timestamp(TimeUnit::Nanosecond, None),
        &CastOptions {safe: false},
    ).unwrap();
}

Produces output:

output:
+------------+
| Date32     |
+------------+
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-01 |
| 1970-01-12 |
| 2001-09-09 |
| 2033-05-18 |
+------------+
output:
+-------------------------+
| Date64                  |
+-------------------------+
| 1970-01-01T00:00:00     |
| 1970-01-01T00:00:00.001 |
| 1970-01-01T00:00:01     |
| 1970-01-01T00:16:40     |
| 1970-01-12T13:46:40     |
| 2001-09-09T01:46:40     |
| 2033-05-18T03:33:20     |
+-------------------------+
output:
+-------------------------+
| Timestamp(Second, None) |
+-------------------------+
| 1970-01-01T00:00:00     |
| 1970-01-01T00:00:00     |
| 1970-01-01T00:00:01     |
| 1970-01-01T00:16:40     |
| 1970-01-12T13:46:40     |
| 2001-09-09T01:46:40     |
| 2033-05-18T03:33:20     |
+-------------------------+
output:
+------------------------------+
| Timestamp(Millisecond, None) |
+------------------------------+
| 1970-01-01T00:00:00          |
| 1970-01-01T00:00:00.001      |
| 1970-01-01T00:00:01          |
| 1970-01-01T00:16:40          |
| 1970-01-12T13:46:40          |
| 2001-09-09T01:46:40          |
| 2033-05-18T03:33:20          |
+------------------------------+
output:
+------------------------------+
| Timestamp(Microsecond, None) |
+------------------------------+
| 1970-01-01T00:00:00.000001   |
| 1970-01-01T00:00:00.001      |
| 1970-01-01T00:00:01          |
| 1970-01-01T00:16:40          |
| 1970-01-12T13:46:40          |
| 2001-09-09T01:46:40          |
| 2033-05-18T03:33:20          |
+------------------------------+
output:
+-----------------------------+
| Timestamp(Nanosecond, None) |
+-----------------------------+
| 1970-01-01T00:00:00.000001  |
| 1970-01-01T00:00:00.001     |
| 1970-01-01T00:00:01         |
| 1970-01-01T00:16:40         |
| 1970-01-12T13:46:40         |
| 2001-09-09T01:46:40         |
| 2033-05-18T03:33:20         |
+-----------------------------+
output:
+----------+
| ms -> ns |
+----------+
|          |
+----------+
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ComputeError("Overflow happened on: 2000000000000000000 * 1000")', src/main.rs:96:7
stack backtrace:
   0: rust_begin_unwind
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/panicking.rs:64:14
   2: core::result::unwrap_failed
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/result.rs:1790:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/result.rs:1112:23
   4: rust_arrow_playground::cast_timestamps
             at ./src/main.rs:92:15
   5: rust_arrow_playground::main
             at ./src/main.rs:16:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changeloghelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions