-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Related to #6163
The take kernel for StringView and BinaryView is implemented using GenericByteViewArray::new() which is a safe constructor that does full utf8 validation for all non-inlined strings in the buffers. This is kind of silly, given we're not even constructing a new array, just copying the existing buffers arrays that are known to contain well-formed utf8 values.
In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one of the more prominent items, in many cases causing a regression of up to 50% over Utf8.
Describe the solution you'd like
The take_byte kernel for Utf8/Binary arrays constructs an ArrayData instance and does not perform Utf8 validation, since we're taking from an already known-good Utf8 array.
Describe alternatives you've considered
Additional context
