-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
ArrayDataBuilder performs various validation of the offsets array. In particular, it validates that offsets are monotonically increasing and within the bounds of the values array.
However, it is my understanding that nulls can have arbitrary offsets and so I think this might be overly strict?
To Reproduce
let offsets = vec![2, 0, 2, 2];
let validity = vec![false, true, false];
let values = "ab";
let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
offsets_buffer.extend_from_slice(&offsets);
let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
let mut values_buffer = MutableBuffer::new(values.len());
values_buffer.extend_from_slice(values.as_bytes());
let arraydata = ArrayDataBuilder::new(DataType::Utf8)
.len(validity.len())
.add_buffer(offsets_buffer.into())
.add_buffer(values_buffer.into())
.null_bit_buffer(validity_buffer.into())
.build()
.unwrap();
Expected behavior
I would expect this to not error, as the non-null elements have valid offsets
Additional context
I encountered this whilst trying to produce a reproducer for a related bug, where the string comparison kernels panic in the presence of non-monotonically increasing offsets. This in turn was whilst working on a parquet string array decoder, where I was hoping to just leave offsets for nulls zero-initialized.
Reactions are currently unavailable