Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We struggle with the memory used by the RowConverter when interning values from DictionaryArrays. We are even proposing a special CardinalityAware wrapper on top of the RowConverter in DataFusion (see apache/datafusion#7401)
At the moment, round tripping data from Array to Rows and then back to Array works like this:
DictionaryArray -- (preserve_dictionaries = false) --> Rows --> Primtive/StringArray
In DataFusion we must maintain the same input / output types, so in our proposed improvement we needed to add a call to cast, which @tustvold notes is likely very expensive: https://github.com/apache/arrow-datafusion/pull/7401/files#r1324281222
Describe the solution you'd like
I would like the RowConverter to produce the same output type as the input type on SortField, even if preserve_dictionaries is set to false
This would avoid a copy of the String data and likely perform much better.
Describe alternatives you've considered
We could potentially simply remove stateful row encoding: #4811
Additional context