Improve performance of casting `DictionaryArray` to `StringViewArray` by XiangpengHao · Pull Request #5871 · apache/arrow-rs

XiangpengHao · 2024-06-11T14:29:11Z

Which issue does this PR close?

Part of #5861 .

Rationale for this change

Casting from DictionaryArray to String/BinaryView was previously handled by unpack_dictionary: https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast/dictionary.rs#L93-L105

which incurs unnecessary copy to the value buffer. This pr handles Utf8View and BinaryView so that it will reuse the value buffer instead of creating a new one.

What changes are included in this PR?

Are there any user-facing changes?

No

alamb

Thank you @XiangpengHao -- This is looking great.

arrow-cast/src/cast/dictionary.rs

alamb · 2024-06-11T15:35:28Z

arrow-cast/src/cast/dictionary.rs

+                StringViewArray::new_unchecked(
+                    view_buffer,
+                    vec![value_buffer],
+                    dict_array.nulls().cloned(),


I think calling nulls() doesn't handle the case where the dictionary value itself (rather than the key) was null

arrow-rs/arrow-array/src/array/dictionary_array.rs

Line 727 in cf59b6c

fn nulls(&self) -> Option<&NullBuffer> {

I think this should call logical_nulls() instead:

arrow-rs/arrow-array/src/array/dictionary_array.rs

Line 731 in cf59b6c

fn logical_nulls(&self) -> Option<NullBuffer> {

Also, it would be good to create a test case that covers this too

Nice catch! added a new test to cover this

(fyi there seem to be no new commits to this PR)

alamb · 2024-06-11T15:37:50Z

arrow-cast/src/cast/mod.rs

+    #[test]
+    fn test_dict_to_view() {
+        let string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA);
+        let string_dict_array: DictionaryArray<Int8Type> = VIEW_TEST_DATA.into_iter().collect();


Can you please update this test to have a dictionary array that has:

Repeated use of dictionary values

keys that are not all increasing

Nulls in the values (as well as the keys)

Perhaps what you can do is create a StringArray from VIEW_TEST_DATA and then make create the indexes manually

Like this example https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#example-from-existing-arrays

Does that make sense?

alamb · 2024-06-11T15:40:53Z

arrow-cast/src/cast/dictionary.rs

+                let length = end - offset;
+                let value_buf = &value_buffer[offset as usize..end as usize];
+
+                if length <= 12 {


Instead of creating the views directly, what do you think about using try_append_view and try_append_block added in this PR: #5796

Maybe as bonus points you could add a benchmark for this casting operation, and see if adding append_view_unchecked would make any difference

Unfortunately, adding append_view_unchecked improved the performance by 50%

dict to view time: [38.116 µs 38.122 µs 38.127 µs] change: [-50.618% -50.455% -50.290%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe

I think it justifies adding append_view_unchecked, what do you think?

Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..

I think it justifies adding append_view_unchecked, what do you think?

Makes sense to me

Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..

I do think we should add the benchmark to the repo (so we can use it for future optimizations)

In terms of justifying append_view_unsafe I think running the benchmark on a local checkout that calls append_view and then revert and run the same benchmark is fine (which is presumably what you did)

XiangpengHao · 2024-06-11T20:08:09Z

BTW, with this pull request, casting a dictionary array to string view with 10_000 items improved performance by 25%

dict to view            time:   [37.979 µs 37.990 µs 38.001 µs]
                        change: [-25.271% -25.253% -25.237%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

alamb

Looks good to me -- I think this could be made significantly faster but I think we should add a benchmark and optimize as a follow on PR

Thank you @XiangpengHao

alamb · 2024-06-11T20:08:42Z

arrow-cast/src/cast/mod.rs

+        let typed_dict = string_dict_array.downcast_dict::<StringArray>().unwrap();
+
+        let string_view_array = {
+            let mut builder = StringViewBuilder::new().with_block_size(8); // multiple buffers.


alamb · 2024-06-11T20:10:29Z

arrow-cast/src/cast/dictionary.rs

+    let value_offsets = array.value_offsets();
+    let mut builder = GenericByteViewBuilder::<T>::with_capacity(keys.len());
+    builder.append_block(value_buffer.clone());
+    for i in keys.iter() {


Another potential optimization is a separate loop if there are no nulls in keys (so we can avoid the branch)

Another potental idea is to use value_offsets.windows(2) as an iterator to avoid the bounds checks in value_offsets

However, I think we should merge this basic PR in as is, and then add a bencmark and optimize this kenrnel as a follow on PR (if we care). I can file a ticket if @tustvold agrees

ops, I checked in a append_view_unchecked.

But if we do try_append_view, it is even slower than the unpack_dictionary approach

I can also move append_view_unchecked as a follow-up PR and potentially clean up other use cases where ByteViews are manually constructed.

alamb

Thanks @XiangpengHao --

This looks really nice to me. Can you please make a separate PR with the cast dictionary --> view benchmark? Then we can use that benchmark to ensure that this approach is faster than "unpack dictionary" (which I am sure it will be)

alamb · 2024-06-12T14:44:03Z

arrow-array/src/builder/generic_bytes_view_builder.rs

-            };
-            self.views_builder.append(view.into());
+        unsafe {
+            self.append_view_unchecked(block, offset, len);


alamb · 2024-06-13T10:05:27Z

My plan is to merge this branch up to main, run the benchmarks added in #5874 on it and then assuming they look good merge it in

alamb · 2024-06-13T11:45:20Z

The bechmark I ran suggests this kernel is almost 2x as fast as the existing approach (which makes sense as it avoids a copy).

Nice work @XiangpengHao

++ critcmp master view-to-dict
group                       master                                 view-to-dict
-----                       ------                                 ------------
cast dict to string view    1.79    123.2±1.68µs        ? ?/sec    1.00     68.8±2.76µs        ? ?/sec
cast string view to dict    1.00    249.9±0.68µs        ? ?/sec    1.07    268.3±0.53µs        ? ?/sec

zero-copy dict to view

63db275

github-actions bot added the arrow Changes to the arrow crate label Jun 11, 2024

alamb reviewed Jun 11, 2024

View reviewed changes

refactor to use try_append_view

0308dd4

alamb approved these changes Jun 11, 2024

View reviewed changes

XiangpengHao added 2 commits June 11, 2024 16:17

unchecked view

c27d548

make fmt happy

fdebc0e

XiangpengHao mentioned this pull request Jun 11, 2024

Improve performance of casting StringView/BinaryView to DictionaryArray #5872

Merged

alamb approved these changes Jun 12, 2024

View reviewed changes

XiangpengHao mentioned this pull request Jun 12, 2024

Benchmark for casting view to dict arrays (and the reverse) #5874

Merged

alamb added 2 commits June 13, 2024 06:06

Merge remote-tracking branch 'apache/master' into view-to-dict

ca11706

Merge remote-tracking branch 'apache/master' into view-to-dict

52c28c1

alamb changed the title ~~zero copy cast dict array to view type arrays~~ Improve performance of casting DictionaryArray to StringViewArray Jun 13, 2024

alamb merged commit 77271c4 into apache:master Jun 13, 2024

Jefffrey mentioned this pull request Nov 14, 2025

Cannot cast string dictionary to binary view #8841

Closed

Comments

Conversation

XiangpengHao commented Jun 11, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XiangpengHao commented Jun 11, 2024

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 13, 2024

Uh oh!

alamb commented Jun 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants