feat: add substrait support for Interval types and literals#10646
feat: add substrait support for Interval types and literals#10646comphead merged 4 commits intoapache:mainfrom
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
| /// - days: `i32` | ||
| /// - nanoseconds: `i64` | ||
| /// | ||
| /// See also [`ScalarValue::IntervalMonthDayNano`] for the literal definition in DataFusion. |
There was a problem hiding this comment.
FYI the interval implementation is changing in the next arrow I think: apache/arrow-rs#5769
There was a problem hiding this comment.
Thanks for informing, I'd like to help migrate to the new arrow version. BTW, is there any plan for next bump?
There was a problem hiding this comment.
I think ETA is next week sometime apache/arrow-rs#5688
Maybe you can make a "pre-release" of DataFusion against the un-released version of arrow-rs (which @tustvold often does to make sure as a way to sanity check the release)
| }, | ||
| r#type::Kind::UserDefined(u) => { | ||
| match u.type_reference { | ||
| INTERVAL_YEAR_MONTH_TYPE_REF => { |
There was a problem hiding this comment.
I may be totally wrong, but are you sure this is how type_reference is supposed to work? I'd kind of expect the type_reference to map to an extension / MappingType::ExtensionType, kinda like function_reference.
There was a problem hiding this comment.
I am not at all sure how this is supposed to work -- when I was reviewing this PR I think I concluded it followed the existing patterns.
If you have additional information / knowledge that would help improve things I think we would welcome that help
There was a problem hiding this comment.
There was a problem hiding this comment.
I see, actually I think both references (type_variation_reference and type_reference) have the same problem - I hadn't realized it affects the type_variation_reference too. Now, if each system defines the type/variation references as consts, then plans will look compatible, but may have different meaning (nothing tells another Substrait producer/consumer that type_variation_reference = 1 on a list means a large list, for example).
I believe both should be defined as SimpleExtension files, following the schema (like here https://github.com/apache/arrow/blob/main/format/substrait/extension_types.yaml) rather than as constants (kinda what
already says 😅). And then (I think) in the plan itself, there'd be instances of the SimpleExtensionDeclaration messages (https://github.com/substrait-io/substrait/blob/4f5b4ac4d473c9f03f30f86eca080073d99ef1c7/proto/substrait/extensions/extensions.proto#L39), and the type_reference and type_variation_reference would link to the type_anchor and type_variation_anchor, rather than to the hard-coded constants.That way one can (in theory, at least) teach another Substrait producer/consumer about the DataFusion extensions and keep plans compatible, or at least the systems will recognize that the plans are incompatible as it refers to extension URIs that the other system doesn't know about.
Does that make sense? I'm not 100% sure of anything I'm saying here, so I may as be understanding something wrong.
There was a problem hiding this comment.
I believe both should be defined as SimpleExtension files
Yes, that's right. The last (I'm aware of, at least) big piece we are missing in substrait is those *.yaml spec for all extended things and related URL settings. At present, all the things are defined in the document.
From the substrait website, we'll need a yaml parsing component to support extensions from other systems as well, if we are going to implement the ability to consume plans from external systems.
There was a problem hiding this comment.
going to file a tracking issue for tasks related to substrait support
There was a problem hiding this comment.
Ah cool, so it was the plan all along :) Sg!
…0646) * feat: support interval types Signed-off-by: Ruihang Xia <[email protected]> * impl literals Signed-off-by: Ruihang Xia <[email protected]> * fix deadlink in doc Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]>
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Support convert to/from three interval types and literals in substrait
Are these changes tested?
Yes. There is a new UT
roundtrip_interval_literalthat covers both type and literals.Are there any user-facing changes?