Skip to content

cast kernel support for StringViewArray and BinaryViewArray <--> DictionaryArray` #5861

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement StringViewArray -- see #5374

In #5508, @RinChanNOWWW tracked adding casting to/from StringArray 🙏 ❤️

This ticket tracks adding additional data type support for StringViewArray and ByteViewArray in the cast kernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.html

Many systems (e.g InfluxDB 3.0, Apache DataFusion Comet, and I think Coralogix) use DictionaryArrays. Thus supporting casting to/from DictionaryArray will be important to permit easy integration into downstream consumers

Describe the solution you'd like

Specifically the following conversions should be supported in the cast kernels:

  • StringViewArray <--> DictionaryArray<IndexType, Utf8>
  • StringViewArray <--> DictionaryArray<IndexType, LargeUtf8>

And similarly for Binary:

  • BinaryViewArray <--> DictionaryArray<IndexType, Binary>
  • BinaryViewArray <--> DictionaryArray<IndexType, LargeBinary>

Notes:

  1. Good test coverage is the most important part of this ticket
  2. I recommend smaller PRs if possible
  3. I think DictionaryArray<IndexType, LargeUtf8> --> StringViewArray can be implemented without copying strings
  4. I think StringViewArray --> DictionaryArray<IndexType, LargeUtf8> will likely require copying the strings

Describe alternatives you've considered
I think casting from Dictionary

Additional context

Metadata

Metadata

Assignees

Labels

arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions