Skip to content

Improve speed of row converter by skipping utf8 checks #6058

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #5374

@XiangpengHao implemented optimized row format --> ByteView (StringView / BinaryView) encoding/decoding in #5945 / #6044

It also adds benchmarks so we can test🎉

However, as mentioned in https://github.com/apache/arrow-rs/pull/6044/files#r1676804033 if we know that the Row value was created from valid utf8 values, re-validating utf8 is unnecessary.

Describe the solution you'd like

Consider an API that would allow skipping utf8 validation

This would need to be justified by performance benchmarks showing it made a significant difference in performance

Describe alternatives you've considered

Perhaps it would be an unsafe option on the RowConverter

let converter = RowConverter::new(...);

// Safety: only decoding Rows that came from valid String arrays
let converter = unsafe {
  converter.with_validate_utf8(false)
}

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions