-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #5374
@XiangpengHao implemented optimized row format --> ByteView (StringView / BinaryView) encoding/decoding in #5945 / #6044
It also adds benchmarks so we can test🎉
However, as mentioned in https://github.com/apache/arrow-rs/pull/6044/files#r1676804033 if we know that the Row value was created from valid utf8 values, re-validating utf8 is unnecessary.
Describe the solution you'd like
Consider an API that would allow skipping utf8 validation
This would need to be justified by performance benchmarks showing it made a significant difference in performance
Describe alternatives you've considered
Perhaps it would be an unsafe option on the RowConverter
let converter = RowConverter::new(...);
// Safety: only decoding Rows that came from valid String arrays
let converter = unsafe {
converter.with_validate_utf8(false)
}Additional context