Avoid copy/allocation when read view types from parquet#5877
Avoid copy/allocation when read view types from parquet#5877alamb merged 2 commits intoapache:masterfrom
Conversation
|
FYI: We discussed in #5557, using an independent view_buffer to replace offset_buffer, see #5557 (comment) for details. |
The |
|
|
||
| if len != 0 { | ||
| builder | ||
| .try_append_view(0, start.as_usize() as u32, len as u32) |
There was a problem hiding this comment.
0 -> append_block's return offset
alamb
left a comment
There was a problem hiding this comment.
Looks great to me -- thank you @XiangpengHao
I ran the benchmarks and it looks 💯
++ critcmp master parquet-view
group master parquet-view
----- ------ ------------
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs 38.83 21.2±0.04ms ? ?/sec 1.00 546.7±1.85µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs 7.61 5.6±0.01ms ? ?/sec 1.00 736.2±1.89µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs 38.75 21.2±0.04ms ? ?/sec 1.00 548.1±1.03µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs 69.37 42.3±0.06ms ? ?/sec 1.00 609.7±1.34µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs 14.44 11.2±0.03ms ? ?/sec 1.00 774.7±1.57µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs 69.40 42.4±0.06ms ? ?/sec 1.00 610.4±1.59µs ? ?/sec
I wonder if we can also use try_append_unchecked and make it even faster?
I am merging this in to keep things moving -- maybe we can consider using cc @ariesdevil |
Which issue does this PR close?
Part of #5530.
An alternative implementation (maybe subset) of #5557
Rationale for this change
This change is very simple -- only 8 lines, but gives us many performance improvements (10-80x):
To reproduce:
What changes are included in this PR?
Are there any user-facing changes?