Skip to content

Offset Validation Nulls #1071

@tustvold

Description

@tustvold

Describe the bug

ArrayDataBuilder performs various validation of the offsets array. In particular, it validates that offsets are monotonically increasing and within the bounds of the values array.

However, it is my understanding that nulls can have arbitrary offsets and so I think this might be overly strict?

To Reproduce

let offsets = vec![2, 0, 2, 2];
let validity = vec![false, true, false];
let values = "ab";

let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
offsets_buffer.extend_from_slice(&offsets);

let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());

let mut values_buffer = MutableBuffer::new(values.len());
values_buffer.extend_from_slice(values.as_bytes());

let arraydata = ArrayDataBuilder::new(DataType::Utf8)
    .len(validity.len())
    .add_buffer(offsets_buffer.into())
    .add_buffer(values_buffer.into())
    .null_bit_buffer(validity_buffer.into())
    .build()
    .unwrap();

Expected behavior

I would expect this to not error, as the non-null elements have valid offsets

Additional context

I encountered this whilst trying to produce a reproducer for a related bug, where the string comparison kernels panic in the presence of non-monotonically increasing offsets. This in turn was whilst working on a parquet string array decoder, where I was hoping to just leave offsets for nulls zero-initialized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions