-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement StringViewArray -- see #5374
In #5508, @RinChanNOWWW tracked adding casting to/from StringArray 🙏 ❤️
This ticket tracks adding additional data type support for StringViewArray and ByteViewArray in the cast kernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.html
Many systems (e.g InfluxDB 3.0, Apache DataFusion Comet, and I think Coralogix) use DictionaryArrays. Thus supporting casting to/from DictionaryArray will be important to permit easy integration into downstream consumers
Describe the solution you'd like
Specifically the following conversions should be supported in the cast kernels:
StringViewArray<-->DictionaryArray<IndexType, Utf8>StringViewArray<-->DictionaryArray<IndexType, LargeUtf8>
And similarly for Binary:
BinaryViewArray<-->DictionaryArray<IndexType, Binary>BinaryViewArray<-->DictionaryArray<IndexType, LargeBinary>
Notes:
- Good test coverage is the most important part of this ticket
- I recommend smaller PRs if possible
- I think
DictionaryArray<IndexType, LargeUtf8>-->StringViewArraycan be implemented without copying strings - I think
StringViewArray-->DictionaryArray<IndexType, LargeUtf8>will likely require copying the strings
Describe alternatives you've considered
I think casting from Dictionary
Additional context