Skip to content

Comments

Improve performance of casting StringView/BinaryView to DictionaryArray#5872

Merged
alamb merged 10 commits intoapache:masterfrom
XiangpengHao:dict-to-view
Jun 13, 2024
Merged

Improve performance of casting StringView/BinaryView to DictionaryArray#5872
alamb merged 10 commits intoapache:masterfrom
XiangpengHao:dict-to-view

Conversation

@XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jun 11, 2024

Which issue does this PR close?

Closes #5861 . This pr should only be reviewed/merged after #5871 is merged.

Rationale for this change

Cast view types to dictionary arrays, the opposite of #5871

Note that casting was handled by pack_byte_to_dictionary:
https://github.com/XiangpengHao/arrow-rs/blob/dict-to-view/arrow-cast/src/cast/dictionary.rs#L307

But it takes two steps: Utf8View -> Utf8 -> Dictionary, which causes two copies.
This PR only do one copy (directly from Utf8View -> Dictionary<>)

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 11, 2024
@alamb alamb changed the title View types to DictionaryArray Add cast View types to DictionaryArray Jun 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao --

Since the rationale for these changes is performance, I think we should have benchmark results showing how much faster it is.

Thus, can you please make a PR with a banchmark (a single PR with benchmarks both for this change and the one in #5871)

Then we'll use the benchmarks to verify that this PR really does improve performance (I am sure it will, but we should double check)

@alamb alamb changed the title Add cast View types to DictionaryArray Improve performance of casting StringView/BinaryView to DictionaryArray Jun 13, 2024
@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

I took the liberty of merging this PR up from main. Running performance benchmarks on it now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measure a 30% performance improvement on this branch. Thank you @XiangpengHao

++ critcmp master dict-to-view
group                       dict-to-view                           master
-----                       ------------                           ------
cast dict to string view    1.00     69.0±2.58µs        ? ?/sec    1.02    70.6±12.48µs        ? ?/sec
cast string view to dict    1.00    203.3±0.32µs        ? ?/sec    1.34    272.1±0.69µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

Does this PR close #5861 I wonder 🤔

@XiangpengHao
Copy link
Contributor Author

Does this PR close #5861 I wonder 🤔

I guess it should. Are there any remaining items?

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

I guess it should. Are there any remaining items?

Not that I know of

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cast kernel support for StringViewArray and BinaryViewArray <--> DictionaryArray`

2 participants