Improve performance of casting DictionaryArray to StringViewArray#5871
Improve performance of casting DictionaryArray to StringViewArray#5871alamb merged 6 commits intoapache:masterfrom
DictionaryArray to StringViewArray#5871Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @XiangpengHao -- This is looking great.
arrow-cast/src/cast/dictionary.rs
Outdated
| StringViewArray::new_unchecked( | ||
| view_buffer, | ||
| vec![value_buffer], | ||
| dict_array.nulls().cloned(), |
There was a problem hiding this comment.
I think calling nulls() doesn't handle the case where the dictionary value itself (rather than the key) was null
I think this should call logical_nulls() instead:
Also, it would be good to create a test case that covers this too
There was a problem hiding this comment.
Nice catch! added a new test to cover this
There was a problem hiding this comment.
(fyi there seem to be no new commits to this PR)
arrow-cast/src/cast/mod.rs
Outdated
| #[test] | ||
| fn test_dict_to_view() { | ||
| let string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA); | ||
| let string_dict_array: DictionaryArray<Int8Type> = VIEW_TEST_DATA.into_iter().collect(); |
There was a problem hiding this comment.
Can you please update this test to have a dictionary array that has:
- Repeated use of dictionary values
- keys that are not all increasing
- Nulls in the values (as well as the keys)
Perhaps what you can do is create a StringArray from VIEW_TEST_DATA and then make create the indexes manually
Like this example https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#example-from-existing-arrays
Does that make sense?
arrow-cast/src/cast/dictionary.rs
Outdated
| let length = end - offset; | ||
| let value_buf = &value_buffer[offset as usize..end as usize]; | ||
|
|
||
| if length <= 12 { |
There was a problem hiding this comment.
Instead of creating the views directly, what do you think about using try_append_view and try_append_block added in this PR: #5796
Maybe as bonus points you could add a benchmark for this casting operation, and see if adding append_view_unchecked would make any difference
There was a problem hiding this comment.
Unfortunately, adding append_view_unchecked improved the performance by 50%
dict to view time: [38.116 µs 38.122 µs 38.127 µs]
change: [-50.618% -50.455% -50.290%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
I think it justifies adding append_view_unchecked, what do you think?
Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..
There was a problem hiding this comment.
I think it justifies adding append_view_unchecked, what do you think?
Makes sense to me
Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..
I do think we should add the benchmark to the repo (so we can use it for future optimizations)
In terms of justifying append_view_unsafe I think running the benchmark on a local checkout that calls append_view and then revert and run the same benchmark is fine (which is presumably what you did)
|
BTW, with this pull request, casting a dictionary array to string view with 10_000 items improved performance by 25% |
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- I think this could be made significantly faster but I think we should add a benchmark and optimize as a follow on PR
Thank you @XiangpengHao
| let typed_dict = string_dict_array.downcast_dict::<StringArray>().unwrap(); | ||
|
|
||
| let string_view_array = { | ||
| let mut builder = StringViewBuilder::new().with_block_size(8); // multiple buffers. |
| let value_offsets = array.value_offsets(); | ||
| let mut builder = GenericByteViewBuilder::<T>::with_capacity(keys.len()); | ||
| builder.append_block(value_buffer.clone()); | ||
| for i in keys.iter() { |
There was a problem hiding this comment.
Another potential optimization is a separate loop if there are no nulls in keys (so we can avoid the branch)
Another potental idea is to use value_offsets.windows(2) as an iterator to avoid the bounds checks in value_offsets
However, I think we should merge this basic PR in as is, and then add a bencmark and optimize this kenrnel as a follow on PR (if we care). I can file a ticket if @tustvold agrees
There was a problem hiding this comment.
ops, I checked in a append_view_unchecked.
But if we do try_append_view, it is even slower than the unpack_dictionary approach
There was a problem hiding this comment.
I can also move append_view_unchecked as a follow-up PR and potentially clean up other use cases where ByteViews are manually constructed.
alamb
left a comment
There was a problem hiding this comment.
Thanks @XiangpengHao --
This looks really nice to me. Can you please make a separate PR with the cast dictionary --> view benchmark? Then we can use that benchmark to ensure that this approach is faster than "unpack dictionary" (which I am sure it will be)
| }; | ||
| self.views_builder.append(view.into()); | ||
| unsafe { | ||
| self.append_view_unchecked(block, offset, len); |
|
My plan is to merge this branch up to main, run the benchmarks added in #5874 on it and then assuming they look good merge it in |
|
The bechmark I ran suggests this kernel is almost 2x as fast as the existing approach (which makes sense as it avoids a copy). Nice work @XiangpengHao |
DictionaryArray to StringViewArray
Which issue does this PR close?
Part of #5861 .
Rationale for this change
Casting from
DictionaryArraytoString/BinaryViewwas previously handled byunpack_dictionary: https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast/dictionary.rs#L93-L105which incurs unnecessary copy to the value buffer. This pr handles Utf8View and BinaryView so that it will reuse the value buffer instead of creating a new one.
What changes are included in this PR?
Are there any user-facing changes?
No