Update Arrow bounds to >=15,<22 by bdice · Pull Request #19592 · rapidsai/cudf

bdice · 2025-08-05T17:23:36Z

Description

Updates Arrow bounds to >=15,<22.

This makes cuDF compatible with Arrow 20 and 21.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

bdice · 2025-08-05T19:09:29Z

I had to switch from including <arrow/util/tdigest.h> to what appears to be its replacement: <arrow/util/tdigest_internal.h>. I don't like this change relying on a header marked as "internal", and we should investigate fixing that.

bdice · 2025-08-05T19:11:05Z

Actually, this is not likely to work. The internal headers aren't supposed to be installed, so it may not exist. xref: apache/arrow#46721

@nvdbaranec Do you have thoughts on how to rewrite this to avoid the arrow::internal:: namespace?

https://github.com/rapidsai/cudf/blame/268344f0c267b9ef3d0a43ab3a36079cece298f4/cpp/tests/quantiles/percentile_approx_test.cpp#L62

vyasr · 2025-08-11T21:42:19Z

The test will be fixed when #19648 goes in.

bdice · 2025-08-12T16:19:06Z

@AyodeAwe Can you provide a packaging review?

Matt711 · 2025-08-22T15:40:52Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

Matt711 · 2025-08-22T17:34:44Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

This test passed on a rerun

Matt711 · 2025-08-27T16:15:52Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

This test passed on a rerun

I think @Vyas mentioned offline that this flaky test should stop being flaky once #15163 is in. Is that accurate?

vyasr · 2025-08-29T20:01:05Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

This test passed on a rerun

I think @Vyas mentioned offline that this flaky test should stop being flaky once #15163 is in. Is that accurate?

My GH handle is @vyasr 😂 but no, that will not fix it. We don't know the source of that issue yet.

Matt711 · 2025-08-29T20:10:33Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

This test passed on a rerun

I think @Vyas mentioned offline that this flaky test should stop being flaky once #15163 is in. Is that accurate?

My GH handle is @vyasr 😂 but no, that will not fix it. We don't know the source of that issue yet.

Lol (😭 )

So I've been telling people wrong all week. TBF I did say "may" and "IIRC" when I brought it up. Should I open an issue to track?

vyasr · 2025-08-29T21:02:38Z

Still a failing parquet test: FAILED tests/input_output/test_s3.py::test_read_parquet[kvikio=ON-None-32] - json.decoder.JSONDecodeError: Expecting value: line 1 column 46 (char 45). I'll take a look at it

This test passed on a rerun

I think @Vyas mentioned offline that this flaky test should stop being flaky once #15163 is in. Is that accurate?

My GH handle is @vyasr 😂 but no, that will not fix it. We don't know the source of that issue yet.

Lol (😭 )

So I've been telling people wrong all week. TBF I did say "may" and "IIRC" when I brought it up. Should I open an issue to track?

I can write up the issue with what I know (which honestly isn't a whole lot right now).

vyasr · 2025-08-29T21:05:29Z

Hmm wait sorry this looks like a different failure from the one that I was thinking of. I was thinking of a failure in io/test_json.py::test_write_json_basic. I'm not sure I've seen this one. Is there a thread somewhere that you recall this one being discussed?

vyasr · 2025-09-02T17:10:04Z

/merge

vyasr · 2025-09-02T20:32:11Z

/merge

Due to changes in apache/arrow#46912, when Arrow is built via FetchContent (which CPM uses) and thrift is built via arrow, the thrift build is no longer nested inside the arrow build. The changes in rapidsai/cudf#19592 update cudf to use a newer version of Arrow, that includes the linked Arrow PR, so this PR will be required for Spark builds once that cudf PR is merged. --------- Signed-off-by: Vyas Ramasubramani <[email protected]>

The latest PR to update our pyarrow pinnings #19592 made us compatible with the latest version of Arrow. The update was a little bumpy, but the main reasons had to do with 1) our improper use of Arrow APIs in our C++ tests and 2) a bug in our reading of v2 parquet files. Actual usage of our library was fine, so users would have been OK using a newer version, and we might have caught the bugs in our parquet support sooner. This PR proposes dropping the upper bound entirely to allow us to automatically support future versions as they are released. There is no real need for us to upgrade the version of Arrow that our C++ builds against; if it's already working, then we can stick with it since we're primarily using it for testing. If the Spark team finds a reason to request an upgrade we can always bump the CMake pin, but [they also plan to move to nanoarrow eventually](NVIDIA/spark-rapids-jni#3268) so I doubt it'll be a priority. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) URL: #19870

bdice added 2 commits August 5, 2025 12:21

Update arrow bounds to >=15,<22

c9bc85b

Update libarrow version.

7cc96eb

bdice requested review from a team as code owners August 5, 2025 17:23

bdice requested a review from AyodeAwe August 5, 2025 17:23

github-actions bot assigned bdice Aug 5, 2025

bdice added feature request New feature or request non-breaking Non-breaking change labels Aug 5, 2025

mhaseeb123 mentioned this pull request Aug 5, 2025

Enable the predicate pushdown pytest affected by Arrow v19 stats incompatibility bug. #17806

Closed

mhaseeb123 approved these changes Aug 5, 2025

View reviewed changes

Use tdigest_internal.h to see if builds pass

268344f

bdice requested a review from a team as a code owner August 5, 2025 19:08

bdice requested review from nvdbaranec and ttnghia August 5, 2025 19:08

ttnghia approved these changes Aug 6, 2025

View reviewed changes

Matt711 mentioned this pull request Aug 12, 2025

[BUG] cudaErrorIllegalAddress encounted when reading a parquet file #19642

Open

Merge branch 'branch-25.10' into update-arrow-bounds

02fc136

github-actions bot assigned Matt711 Aug 12, 2025

Add initialization

4bd1f7a

github-actions bot assigned vyasr Aug 15, 2025

vyasr added 6 commits August 15, 2025 21:54

Add arrow_compute lib handling

f4e8c7b

Merge branch 'branch-25.10' into update-arrow-bounds

37387e2

Avoid exporting the new dependency for simplicity

a1aca59

Enable ARROW_COMPUTE conditionally

4175cb6

typo

42eb6e0

Merge branch 'branch-25.10' into update-arrow-bounds

65486fa

vyasr and others added 4 commits August 19, 2025 23:14

Merge branch 'branch-25.10' into update-arrow-bounds

83462a7

Merge branch 'branch-25.10' into update-arrow-bounds

5bd5c03

Merge branch 'branch-25.10' into update-arrow-bounds

558194b

Use assert_arrow_table_equal in the cudf parquet tests

bc37522

Matt711 requested a review from a team as a code owner August 21, 2025 19:26

Matt711 requested review from mroeschke and rjzamora August 21, 2025 19:26

Matt711 mentioned this pull request Aug 21, 2025

Add string_view -> string normalization in pylibcudf test utilities #19767

Open

galipremsagar approved these changes Aug 21, 2025

View reviewed changes

Merge branch 'branch-25.10' into update-arrow-bounds

f208dc7

github-actions bot assigned galipremsagar Aug 21, 2025

Matt711 added 3 commits August 22, 2025 17:41

Merge branch 'branch-25.10' into update-arrow-bounds

dd5df10

Merge branch 'branch-25.10' into update-arrow-bounds

e899793

Merge branch 'branch-25.10' into update-arrow-bounds

f6f04b1

Merge branch 'branch-25.10' into update-arrow-bounds

977473e

davidwendt and others added 2 commits September 1, 2025 08:12

Merge branch 'branch-25.10' into update-arrow-bounds

44a2900

Fix bounds

6c83e0d

rapids-bot bot merged commit 245f94a into rapidsai:branch-25.10 Sep 2, 2025
132 of 133 checks passed

vyasr mentioned this pull request Sep 2, 2025

Remove pyarrow upper bound #19870

Merged

3 tasks

Conversation

bdice commented Aug 5, 2025

Description

Checklist

Uh oh!

bdice commented Aug 5, 2025

Uh oh!

bdice commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented Aug 11, 2025

Uh oh!

bdice commented Aug 12, 2025

Uh oh!

Matt711 commented Aug 22, 2025

Uh oh!

Matt711 commented Aug 22, 2025

Uh oh!

Matt711 commented Aug 27, 2025

Uh oh!

vyasr commented Aug 29, 2025

Uh oh!

Matt711 commented Aug 29, 2025

Uh oh!

vyasr commented Aug 29, 2025

Uh oh!

vyasr commented Aug 29, 2025

Uh oh!

vyasr commented Sep 2, 2025

Uh oh!

vyasr commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bdice commented Aug 5, 2025 •

edited

Loading