Improve C Data Interface and Add Integration Testing Entrypoints#5080
Improve C Data Interface and Add Integration Testing Entrypoints#5080tustvold merged 10 commits intoapache:masterfrom
Conversation
|
For now I have tested this locally against the C++ implementation, using these changes to the Archery machinery: |
arrow-integration-testing/src/lib.rs
Outdated
There was a problem hiding this comment.
This pattern would probably benefit from an API function. It's easy to get it wrong.
tustvold
left a comment
There was a problem hiding this comment.
Thank you for working on this, left some comments
|
Ok, I've now checked this works properly against C++, Go, Java and C#. Will mark ready for review. |
|
@tustvold Is it possible to query the amount of memory allocated by Arrow Rust? It is not mandatory but allows checking for memory leaks during integration testing. |
|
Not easily afaik, you could register a global allocator that supports this, but nothing out of the box. Rust's custom allocators are not yet stable |
37f87cd to
ddb35fb
Compare
Ok, I see. We can revisit this later. |
|
@tustvold What would you think of adding a macro like this to ease error reporting? |
No objections from me, although FYI #2725 tracks the broader issue that the error variants in arrow-rs are not particularly meaningful. |
tustvold
left a comment
There was a problem hiding this comment.
This is looking good, mostly just minor nits, although I think we should probably make from_ffi and from_ffi_and_data_type unsafe before merging
arrow-integration-testing/src/lib.rs
Outdated
There was a problem hiding this comment.
Perhaps we could get a doc comment explaining how this differs from ArrowFile? Perhaps LazyArrowFile might be a better name??
Alternatively I wonder if we could just make ArrowFile lazy, and have member functions for decoding a batch to RecordBatch by index or something
There was a problem hiding this comment.
I've renamed to LazyArrowFile, which indeed sounds better. As for ArrowFile, it's marked public, is it ok to change its API?
There was a problem hiding this comment.
This crate isn't published, so you can go wild 😄
There was a problem hiding this comment.
Ok, I did the suggested changes.
ddb35fb to
4deebe6
Compare
|
|
||
| let schema_ref = self.schema(); | ||
| // NOTE: this parses the FFI_ArrowSchema again on each iterator call; | ||
| // should probably use from_ffi_and_data_type() instead. |
There was a problem hiding this comment.
Perhaps a separate issue should be opened for this. @tustvold
674736f to
69c2715
Compare
|
I think I addressed all review comments now. |
|
Thanks once again 👍 |
…n Rust (#38799) ### Rationale for this change Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side: apache/arrow-rs#5080 ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #38798 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ting on Rust (apache#38799) ### Rationale for this change Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side: apache/arrow-rs#5080 ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38798 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Arrow needs to be upgraded alongside pyo3 because they both link to python. Arrow's upgrade to pyo3 is in master but not yet released as of v51. apache/arrow-rs#5566 NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. Arrow's upgrade to pyo3 is in master but not yet released as of v51. apache/arrow-rs#5566 NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Which issue does this PR close?
Closes #4867.
Rationale for this change
Allow Rust to participate in integration testing for the C Data Interface, as explained here:
https://arrow.apache.org/docs/dev/format/Integration.html#example-c-data-interface
What changes are included in this PR?
Add C-compatible entrypoints for the C Data Interface integration testing machinery.
Add support for exporting and importing Intervals over the C Data Interface.
Allow importing from a FFI_ArrowArray and an existing DataType.
Fix handling of
null_countfield in the C ArrowArray struct.Are there any user-facing changes?
No.