Skip to content

IPC big endian offsets are not translated #859

@alamb

Description

@alamb

Describe the bug
../arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.arrow_file contains a UTF8 Arrow array somewhere encoded in big endian.

When this is read in to the arrow-rs implementation, the offsets buffer remains big endian, even though the code assumes the offsets buffer has values in native endianness (e.g. the offsets of the created arrow-rs buffer incorrect on little endian machines like x86)

To Reproduce
See test read_dictionary_be_not_implemented #810

It fails with Length spanned by offsets in Utf8 (687865856) is larger than the values array size (41)

Expected behavior
The test should pass (likely by translating offsets from big endian to native endianness)

Additional context
Found while adding validation in #810

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions