-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
../arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.arrow_file contains a UTF8 Arrow array somewhere encoded in big endian.
When this is read in to the arrow-rs implementation, the offsets buffer remains big endian, even though the code assumes the offsets buffer has values in native endianness (e.g. the offsets of the created arrow-rs buffer incorrect on little endian machines like x86)
To Reproduce
See test read_dictionary_be_not_implemented #810
It fails with Length spanned by offsets in Utf8 (687865856) is larger than the values array size (41)
Expected behavior
The test should pass (likely by translating offsets from big endian to native endianness)
Additional context
Found while adding validation in #810
Reactions are currently unavailable