Improve performance of casting StringView/BinaryView to DictionaryArray#5872
Improve performance of casting StringView/BinaryView to DictionaryArray#5872alamb merged 10 commits intoapache:masterfrom
StringView/BinaryView to DictionaryArray#5872Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @XiangpengHao --
Since the rationale for these changes is performance, I think we should have benchmark results showing how much faster it is.
Thus, can you please make a PR with a banchmark (a single PR with benchmarks both for this change and the one in #5871)
Then we'll use the benchmarks to verify that this PR really does improve performance (I am sure it will, but we should double check)
StringView/BinaryView to DictionaryArray
|
I took the liberty of merging this PR up from main. Running performance benchmarks on it now |
alamb
left a comment
There was a problem hiding this comment.
I measure a 30% performance improvement on this branch. Thank you @XiangpengHao
++ critcmp master dict-to-view
group dict-to-view master
----- ------------ ------
cast dict to string view 1.00 69.0±2.58µs ? ?/sec 1.02 70.6±12.48µs ? ?/sec
cast string view to dict 1.00 203.3±0.32µs ? ?/sec 1.34 272.1±0.69µs ? ?/sec
|
Does this PR close #5861 I wonder 🤔 |
I guess it should. Are there any remaining items? |
Not that I know of |
|
🚀 |
Which issue does this PR close?
Closes #5861 . This pr should only be reviewed/merged after #5871 is merged.
Rationale for this change
Cast view types to dictionary arrays, the opposite of #5871
Note that casting was handled by pack_byte_to_dictionary:
https://github.com/XiangpengHao/arrow-rs/blob/dict-to-view/arrow-cast/src/cast/dictionary.rs#L307
But it takes two steps: Utf8View -> Utf8 -> Dictionary, which causes two copies.
This PR only do one copy (directly from Utf8View -> Dictionary<>)
What changes are included in this PR?
Are there any user-facing changes?