Skip to content

StructArray from ArrayData Ignores Offset #6151

@kylebarron

Description

@kylebarron

Describe the bug

In https://github.com/kylebarron/arro3 I'm exporting arrow-rs functionality for general Python use. I seem to have hit a bug importing sliced arrays.

In import_array_pycapsules (which is vendored from arrow-rs code here) I have:

pub(crate) fn import_array_pycapsules(
    schema_capsule: &Bound<PyCapsule>,
    array_capsule: &Bound<PyCapsule>,
) -> PyResult<(ArrayRef, Field)> {
    validate_pycapsule_name(schema_capsule, "arrow_schema")?;
    validate_pycapsule_name(array_capsule, "arrow_array")?;

    let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
    let array = unsafe { FFI_ArrowArray::from_raw(array_capsule.pointer() as _) };

    let array_data = unsafe { arrow::ffi::from_ffi(array, schema_ptr) }
        .map_err(|err| PyTypeError::new_err(err.to_string()))?;
    dbg!(array_data.offset());

    let field = Field::try_from(schema_ptr).map_err(|err| PyTypeError::new_err(err.to_string()))?;
    let array = make_array(array_data);

    dbg!(array.offset());
    Ok((array, field))
}

Note the two dbg! macros. When invoked from Python with a pyarrow StructArray, the array offset is lost.

import pyarrow as pa
import pytest
from arro3.compute import struct_field

a = pa.array([1, 2, 3])
b = pa.array([3, 4, 5])
struct_arr = pa.StructArray.from_arrays([a, b], names=["a", "b"])
sliced = struct_arr.slice(1, 2)
sliced.offset # 1
pa.array(struct_field(sliced, [0]))
# <pyarrow.lib.Int64Array object at 0x10fa94700>
# [
#   1,
#   2
# ]

Note that the first two elements of a are kept, with the offset not used. I've isolated this to the two lines with dbg!. Those print:

[pyo3-arrow/src/ffi/from_python/utils.rs:84:5] array_data.offset() = 1
[pyo3-arrow/src/ffi/from_python/utils.rs:87:5] array.offset() = 0

In particular make_array does not check the offset from the base array:

impl From<ArrayData> for StructArray {
fn from(data: ArrayData) -> Self {
let fields = data
.child_data()
.iter()
.map(|cd| make_array(cd.clone()))
.collect();
Self {
len: data.len(),
data_type: data.data_type().clone(),
nulls: data.nulls().cloned(),
fields,
}
}
}

To Reproduce

Here's the way to reproduce the upstream bug

git clone https://github.com/kylebarron/arro3
cd arro3
git checkout 9673b62
poetry install
poetry run maturin develop -m arro3-core/Cargo.toml
poetry run maturin develop -m arro3-compute/Cargo.toml
poetry run pytest

I can try to reproduce this in pure rust if needed, but that may not be possible because the StructArray seems to always export an offset of 0, and so it may not be easy to reproduce this importing behavior.

Expected behavior

Expected the array offset to be maintained.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratebug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions