GH-39217: [Python] RecordBatchReader.from_stream constructor for objects implementing the Arrow PyCapsule protocol#39218
Conversation
…r objects implementing the Arrow PyCapsule protocol
|
|
pitrou
left a comment
There was a problem hiding this comment.
This is a good idea. Of course it needs some tests.
Should pyarrow.ipc.open_stream also accept PyCapsule producers?
| ) | ||
|
|
||
| if schema is not None: | ||
| requested = schema.__arrow_c_schema__() |
There was a problem hiding this comment.
Do we also want to first test the presence of this method using hasattr as above?
There was a problem hiding this comment.
Yes, good idea
My understanding is that at the moment this function is meant to work with file-like objects (or the in-memory buffer representing it) for an IPC encapsulated message. That seems a bit different in scope, and I would say it's fine to keep that scope? |
Fair enough. My concern was that |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit dc40e5f. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…r objects implementing the Arrow PyCapsule protocol (apache#39218) ### Rationale for this change In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol. For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome). ### Are these changes tested? TODO * Closes: apache#39217 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Rationale for this change
In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.
For that reason, this proposes an explicit constructor class method for this:
RecordBatchReader.from_stream(this is a quite generic name, so other name suggestions are certainly welcome).Are these changes tested?
TODO