Skip to content

ByteView should be using signed types? #6172

@a10y

Description

@a10y

Which part is this question about

Related to the continuation of StringView/BinaryView support in #6163

In arrow_data, the ByteView type is used to encapsulate this structure from the spec:

image

Notably, the spec dictates that all of these values must be signed integers. However, we're using u32.

The arrow-rs builder for GenericByteViewArray doesn't seem to have any range checks on the block, offset and len values for the view structure, which means, I think, you can happily construct a StringView array with arrow-rs, and then attempt to pass it to PyArrow or Java over IPC and have it fail at runtime.

Describe your question

Should we either

  1. be using i32 instead of u32 internally
  2. be adding constraints on the builder methods to ensure that we don't allow adding strings > 2GB
  3. Has someone noticed this before and addressed it and it's not actually a problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions