Define a "arrow-pyrarrow" crate to implement the "pyarrow" feature.#7694
Merged
alamb merged 1 commit intoapache:mainfrom Jun 20, 2025
Merged
Define a "arrow-pyrarrow" crate to implement the "pyarrow" feature.#7694alamb merged 1 commit intoapache:mainfrom
alamb merged 1 commit intoapache:mainfrom
Conversation
kylebarron
reviewed
Jun 17, 2025
20ac057 to
c8e4396
Compare
This follows the pattern of other parts of the arrow-rs codebase: arrow-array, arrow-schema, etc. With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3). Issue: apache#7668.
c8e4396 to
4d04947
Compare
alamb
approved these changes
Jun 20, 2025
Contributor
There was a problem hiding this comment.
Thank you @brunal and @kylebarron -- this looks great to me
Contributor
I think the benefit of this PR is that other crates can use pyarrow without pulling in all of the |
Contributor
|
FYI @linhr |
Contributor
|
Thanks again @brunal and @kylebarron -- I merged this one in to get it into |
Contributor
|
I hit some error probably related to this PR when making a release candidate: |
alamb
added a commit
that referenced
this pull request
Jun 23, 2025
…yarrow` and `parquet-variant` (#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - #7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to #7394 - Related to #7736 - Related to #7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - #7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com//issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - #7736 - #7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
alamb
added a commit
that referenced
this pull request
Jun 27, 2025
# Which issue does this PR close? - Part of #7394 # Rationale for this change - #7694 adds a new `arrow-pyarrow` crate, so we need to add it to the list of things to release # What changes are included in this PR? # Are these changes tested? no, but they are doc only changes # Are there any user-facing changes? no
alamb
added a commit
that referenced
this pull request
Jul 7, 2025
…`arrow-pyarrow-testing` crate (#7742) # Which issue does this PR close? - Related to #7394 - Closes #7736 # Rationale for this change At its core, if someone isn't using / modifying the pyarrow integration for arrow-rs they shouldn't have to install / configure python to get the tests working in `arrow-rs` - after the change in #7694 Running `cargo test --workspace` now also runs tests that require python to be setup and the `pyarrow` module to be installed. This is problematic because: 1. Some people may not have that environment setup 2. Apparently you can not use virtualenvs with py03 in Mac due to PyO3/pyo3#1741 The second item was very confusing for me while I tried to debug what going on as I ket getting an error about pyarrow not being installed, even though it was installed in my `venv`: ``` thread 'test_to_pyarrow' panicked at arrow-pyarrow/tests/pyarrow.rs:43:6: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'pyarrow'"), traceback: None } ``` # What changes are included in this PR? 1. Move the tests that require pyarrow to be installed into `arrow-pyarrow-testing`, which is not part of the workspace and thus not run with `cargo test --all` 2. Remove `cargo test --exclude arrow-pyarrow` 3. Add documentation on rationale and hints about running the test # Frequently Asked Questions ## Why not add ` --exclude arrow-pyarrow` to `verify_release_candidate.sh`? While the minimal fix would be to add ` --exclude arrow-pyarrow` to verify_release_candidate.sh this requires all users of arrow to remember to add `--exclude arrow-pyarrow` to their tests even if they don't care about python ## Why not in `pyarrow-arrow-integration-testing` ? I did not put this test in `pyarrow-arrow-integration-testing` because that module doesn't compile for me with the stock python install Somehow python needs to be installed with the ability to make dynamic libraries that I haven't figured out and don't really want to. It seems maybe related to https://pyo3.rs/v0.18.1/getting_started#python (thanks to @Xuanwo for the pointer in PyO3/pyo3#2136 / apache/opendal#1675) ``` (venv) root@5e8d0406fabe:/arrow-rs/arrow-pyarrow-integration-testing# cargo test --test pyarrow warning: `/arrow-rs/arrow-pyarrow-integration-testing/.cargo/config` is deprecated in favor of `config.toml` note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml` Compiling target-lexicon v0.13.2 Compiling flatbuffers v25.2.10 Compiling pyo3-build-config v0.24.2 Compiling arrow-ipc v55.2.0 (/arrow-rs/arrow-ipc) Compiling pyo3-macros-backend v0.24.2 Compiling pyo3-ffi v0.24.2 Compiling pyo3 v0.24.2 Compiling pyo3-macros v0.24.2 Compiling arrow-pyarrow v55.2.0 (/arrow-rs/arrow-pyarrow) Compiling arrow v55.2.0 (/arrow-rs/arrow) Compiling arrow-pyarrow-integration-testing v0.1.0 (/arrow-rs/arrow-pyarrow-integration-testing) error: linking with `cc` failed: exit status: 1 | = note: "cc" "/tmp/rustc0jx15I/symbols.o" "<43 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libtest-*,libgetopts-*,libunicode_width-*,librustc_std_workspace_std-*}.rlib" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/{libarrow-7996898a6777f964.rlib,libarrow_row-63508de6e52f4d4d.rlib,libarrow_pyarrow-8b510eeadc952ad2.rlib,libpyo3-c463c3a2243eeab9.rlib,libmemoffset-836dc1ddd866c614.rlib,libpyo3_ffi-fbf18d9f712874be.rlib,libunindent-2b8a456e13fa9700.rlib,libarrow_json-a7b4960a4b4d1cb5.rlib,libsimdutf8-7e080cbee40e41cd.rlib,libserde_json-0288fe0f1ec1bcde.rlib,libindexmap-ebb707a4eec26692.rlib,libequivalent-4762261bc2781d11.rlib,libarrow_ipc-085ebaaded386ff8.rlib,libflatbuffers-1f88fdf138129305.rlib,libarrow_csv-e5d679eef2b85a1b.rlib,libcsv-2288f6dec5308d9c.rlib,libitoa-fa5c9b2503c605f5.rlib,libserde-33ccdec93d601cce.rlib,libcsv_core-122def45831e6a2c.rlib,libarrow_string-17ebd7a5409511da.rlib,libregex-97f4021e65bafbca.rlib,libregex_automata-b62a0db5ace54d45.rlib,libaho_corasick-547ec01718db652c.rlib,libregex_syntax-f3065c7bb7c4592a.rlib,libmemchr-547fa7a4048cbc2e.rlib,libarrow_cast-0b7117723b343c65.rlib,libatoi-c9a52adfe9dd2564.rlib,libryu-243c2c0ae3ed75b4.rlib,libbase64-1cab23258b68443b.rlib,liblexical_core-c2a41d0a6941285f.rlib,liblexical_write_float-9d65854ce5ab8f07.rlib,liblexical_write_integer-894216b914487c18.rlib,liblexical_parse_float-274078b1af50d567.rlib,liblexical_parse_integer-781bcb0a42285559.rlib,liblexical_util-4a71e416d58e0125.rlib,libstatic_assertions-4f12831487497211.rlib,libarrow_arith-00bcbf2ec2eb3322.rlib,libarrow_ord-fcdece9f7e87a9bf.rlib,libarrow_select-31b4c3cfa277427b.rlib,libarrow_array-c2dc23f827508dc6.rlib,libahash-d573f36c088b3179.rlib,libgetrandom-31b11224f8e2ea08.rlib,liblibc-045cef5bc264baa9.rlib,libonce_cell-83f4df333969eacb.rlib,libzerocopy-0db6330db7e4b762.rlib,libhashbrown-2c7527cd2fd4322d.rlib,libchrono-6d1bc7062186f166.rlib,libiana_time_zone-875b10f893e8f81d.rlib,libarrow_data-be090acb3cb83adb.rlib,libarrow_schema-25469a5878e8c886.rlib,libbitflags-2614952e3652d907.rlib,libarrow_buffer-8eb56dc26cbe25a3.rlib,libbytes-8b0f150f04d16150.rlib,libhalf-ed72603b54882276.rlib,libcfg_if-9dbfdc9eaf8f6a2d.rlib,libnum-436acb7880d5b290.rlib,libnum_iter-87f263003ea3e8dd.rlib,libnum_rational-d812b535c653cc6e.rlib,libnum_complex-c12c249f79450951.rlib,libnum_bigint-ff983ebd6646ce72.rlib,libnum_integer-f946a0e48063a631.rlib,libnum_traits-d0a5f363c632fb69.rlib}.rlib" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libstd-*,libpanic_unwind-*,libobject-*,libmemchr-*,libaddr2line-*,libgimli-*,librustc_demangle-*,libstd_detect-*,libhashbrown-*,librustc_std_workspace_alloc-*,libminiz_oxide-*,libadler2-*,libunwind-*,libcfg_if-*,liblibc-*,liballoc-*,librustc_std_workspace_core-*,libcore-*,libcompiler_builtins-*}.rlib" "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-L" "/tmp/rustc0jx15I/raw-dylibs" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-o" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/pyarrow-00909cd9e7866a35" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs" = note: some arguments are omitted. use `--verbose` to show all linker arguments = note: /usr/bin/ld: /arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/libarrow_pyarrow-8b510eeadc952ad2.rlib(arrow_pyarrow-8b510eeadc952ad2.8xxa5xo5oql7wlj24034o033n.rcgu.o): in function `<pyo3::instance::Borrowed<pyo3::types::tuple::PyTuple> as pyo3::call::PyCallArgs>::call_positional': /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/pyo3-0.24.2/src/call.rs:213: undefined reference to `PyObject_Call' ``` # Are there any user-facing changes? If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --------- Co-authored-by: Copilot <[email protected]>
berkaysynnada
pushed a commit
to synnada-ai/arrow-rs
that referenced
this pull request
Jul 17, 2025
…yarrow` and `parquet-variant` (apache#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - apache#7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to apache#7394 - Related to apache#7736 - Related to apache#7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - apache#7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com/apache/issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - apache#7736 - apache#7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This follows the pattern of other parts of the arrow-rs codebase: arrow-array, arrow-schema, etc.
With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3).
It also allows interfacing with PyArrow without pulling in Arrow.
Which issue does this PR close?
Closes #7668.
Rationale for this change
Part of a codebase can use pyarrow without arrow pulling in pyo3 across the codebase.
Are there any user-facing changes?
Nope.