Skip to content

GenericByteViewArray::slice is not zero-copy but ought to be #9014

@maxburke

Description

@maxburke

The comment for GenericByteViewArray::slice says that it's zero-copy, but it isn't.

This thread from the Datafusion slack channel has some details about a particularly gnarly query I was trying to debug: https://the-asf.slack.com/archives/C04RJ0C85UZ/p1765917347919129

The upshot is that if the query uses Utf8 it finishes pretty quickly, but if it uses Utf8View, it is so slow that it effectively never completes because of all the allocation + reallocation + copying that happens when a GenericByteViewArray is sliced.

I hacked up my local version of Datafusion and Arrow to make the GenericByteViewArray::buffers element an Arc<Vec<Buffer>> and it improved the performance dramatically. It wasn't quite as fast as the plain-Utf8 version, possibly because my implementation was pretty hacky, but it at least completed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions