Skip to content

Upgrade to arrow 22#3363

Merged
andygrove merged 8 commits intoapache:masterfrom
spaceandtimefdn:bg_arrow_22
Sep 6, 2022
Merged

Upgrade to arrow 22#3363
andygrove merged 8 commits intoapache:masterfrom
spaceandtimefdn:bg_arrow_22

Conversation

@avantgardnerio
Copy link
Contributor

Which issue does this PR close?

Closes #3362.

Rationale for this change

We should keep DataFusion up to date with Arrow.

What changes are included in this PR?

  • precision is always u8 now
  • builders are in the middle of being refactored so some have new() and others with_capacity()
  • pyo3 needs to be upgraded to 0.17

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner labels Sep 5, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #3363 (bb54c53) into master (4c948a3) will decrease coverage by 0.00%.
The diff coverage is 84.87%.

@@            Coverage Diff             @@
##           master    #3363      +/-   ##
==========================================
- Coverage   85.51%   85.50%   -0.01%     
==========================================
  Files         294      294              
  Lines       54120    54115       -5     
==========================================
- Hits        46279    46272       -7     
- Misses       7841     7843       +2     
Impacted Files Coverage Δ
...usion/core/src/avro_to_arrow/arrow_array_reader.rs 0.00% <0.00%> (ø)
datafusion/core/src/physical_optimizer/pruning.rs 94.75% <ø> (ø)
datafusion/core/src/test/mod.rs 83.48% <0.00%> (ø)
...sion/physical-expr/src/aggregate/count_distinct.rs 95.00% <ø> (ø)
datafusion/physical-expr/src/array_expressions.rs 50.00% <ø> (ø)
datafusion/proto/src/from_proto.rs 34.26% <20.00%> (ø)
...sion/core/src/physical_plan/file_format/parquet.rs 94.56% <33.33%> (ø)
datafusion/physical-expr/src/aggregate/median.rs 73.49% <50.00%> (ø)
datafusion/physical-expr/src/aggregate/sum.rs 69.79% <75.00%> (ø)
datafusion/common/src/scalar.rs 84.62% <77.77%> (-0.17%) ⬇️
... and 29 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@avantgardnerio avantgardnerio marked this pull request as ready for review September 5, 2022 20:47
@avantgardnerio
Copy link
Contributor Author

@alamb thanks for releasing arrow 22! Hopefully this helps release the next version of DataFusion :)

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned about discarding the StringBuilder capacity. StringBuilder::with_capacity(num_of_items, 1024) seems a better choice for me.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thank you @avantgardnerio ❤️

Let me know if you want to make any additional changes otherwise I will plan to merge this in later today or tomorrow

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @avantgardnerio!

@avantgardnerio
Copy link
Contributor Author

if you want to make any additional changes

If CI passes, I think all feedback has been addressed.

@avantgardnerio
Copy link
Contributor Author

CI seems broken. @alamb or @andygrove appreciate it if you could kick it please.

@avantgardnerio
Copy link
Contributor Author

Okay, ready for merge. No idea why github is taking so long today...

@andygrove andygrove merged commit d16457a into apache:master Sep 6, 2022
@andygrove
Copy link
Member

Thanks @avantgardnerio!

@avantgardnerio avantgardnerio deleted the bg_arrow_22 branch September 6, 2022 18:46
@ursabot
Copy link

ursabot commented Sep 6, 2022

Benchmark runs are scheduled for baseline = c89b10f and contender = d16457a. d16457a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to Arrow 22

7 participants