[Python] Public API to consume objects supporting the PyCapsule Arrow C Data Interface

https://github.com/apache/arrow/pull/37797 is adding official dunder methods to expose the Arrow C Data/Stream Interface in Python using PyCapsules (https://github.com/apache/arrow/issues/34031 / https://github.com/apache/arrow/issues/35531).

In addition to official dunders to expose this to other libraries, we also need public APIs in pyarrow to import / consume such PyCapsules (or rather the objects implementing the dunders to give you the PyCapsule). 
https://github.com/apache/arrow/pull/37797 already added this to the `pa.array(..)`, `pa.record_batch(..)` and `pa.schema(..)` constructors, such that you can for example create a pyarrow array with `pa.array(obj)` given any object `obj` that supports the interface by defining `__arrow_c_array__`. 

But that's not fully complete: we certainly need a way to construct a `RecordBatchReader` as well, where we don't have such a factory function available. For this, we could add a `from_` function (similar to the existing `from_batches`) like `RecordBatchReader.from_stream`?

* [x] #39217

(in addition there is also the Table, Field and DataType constructors, both those all have factory functions that could support this, similar to `pa.array(..)` et al)

---

Secondly, I am also wondering if we want to provide APIs that accept PyCapsules directly, instead of an object that implements the dunders. For example, if you are a library that has data in Arrow compatible memory, and you want to convert this to pyarrow through the C Data Interface, you might want to use a PyCapsule directly if your library doesn't expose a Python class that represents that data (to avoid that you need to create a small wrapper class just with the dunder to pass to the pyarrow constructor, although this is of course not difficult).




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Public API to consume objects supporting the PyCapsule Arrow C Data Interface #38010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Public API to consume objects supporting the PyCapsule Arrow C Data Interface #38010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions