-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
In https://github.com/kylebarron/arro3 I'm exporting arrow-rs functionality for general Python use. I seem to have hit a bug importing sliced arrays.
In import_array_pycapsules (which is vendored from arrow-rs code here) I have:
pub(crate) fn import_array_pycapsules(
schema_capsule: &Bound<PyCapsule>,
array_capsule: &Bound<PyCapsule>,
) -> PyResult<(ArrayRef, Field)> {
validate_pycapsule_name(schema_capsule, "arrow_schema")?;
validate_pycapsule_name(array_capsule, "arrow_array")?;
let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
let array = unsafe { FFI_ArrowArray::from_raw(array_capsule.pointer() as _) };
let array_data = unsafe { arrow::ffi::from_ffi(array, schema_ptr) }
.map_err(|err| PyTypeError::new_err(err.to_string()))?;
dbg!(array_data.offset());
let field = Field::try_from(schema_ptr).map_err(|err| PyTypeError::new_err(err.to_string()))?;
let array = make_array(array_data);
dbg!(array.offset());
Ok((array, field))
}Note the two dbg! macros. When invoked from Python with a pyarrow StructArray, the array offset is lost.
import pyarrow as pa
import pytest
from arro3.compute import struct_field
a = pa.array([1, 2, 3])
b = pa.array([3, 4, 5])
struct_arr = pa.StructArray.from_arrays([a, b], names=["a", "b"])
sliced = struct_arr.slice(1, 2)
sliced.offset # 1
pa.array(struct_field(sliced, [0]))
# <pyarrow.lib.Int64Array object at 0x10fa94700>
# [
# 1,
# 2
# ]Note that the first two elements of a are kept, with the offset not used. I've isolated this to the two lines with dbg!. Those print:
[pyo3-arrow/src/ffi/from_python/utils.rs:84:5] array_data.offset() = 1
[pyo3-arrow/src/ffi/from_python/utils.rs:87:5] array.offset() = 0
In particular make_array does not check the offset from the base array:
arrow-rs/arrow-array/src/array/struct_array.rs
Lines 296 to 311 in 80ed712
| impl From<ArrayData> for StructArray { | |
| fn from(data: ArrayData) -> Self { | |
| let fields = data | |
| .child_data() | |
| .iter() | |
| .map(|cd| make_array(cd.clone())) | |
| .collect(); | |
| Self { | |
| len: data.len(), | |
| data_type: data.data_type().clone(), | |
| nulls: data.nulls().cloned(), | |
| fields, | |
| } | |
| } | |
| } |
To Reproduce
Here's the way to reproduce the upstream bug
git clone https://github.com/kylebarron/arro3
cd arro3
git checkout 9673b62
poetry install
poetry run maturin develop -m arro3-core/Cargo.toml
poetry run maturin develop -m arro3-compute/Cargo.toml
poetry run pytest
I can try to reproduce this in pure rust if needed, but that may not be possible because the StructArray seems to always export an offset of 0, and so it may not be easy to reproduce this importing behavior.
Expected behavior
Expected the array offset to be maintained.
Additional context