Skip to content

speedup take_byte_view kernel #6167

@a10y

Description

@a10y

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Related to #6163

The take kernel for StringView and BinaryView is implemented using GenericByteViewArray::new() which is a safe constructor that does full utf8 validation for all non-inlined strings in the buffers. This is kind of silly, given we're not even constructing a new array, just copying the existing buffers arrays that are known to contain well-formed utf8 values.

In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one of the more prominent items, in many cases causing a regression of up to 50% over Utf8.

image

Describe the solution you'd like

The take_byte kernel for Utf8/Binary arrays constructs an ArrayData instance and does not perform Utf8 validation, since we're taking from an already known-good Utf8 array.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions