Upgrade DataFusion to arrow-rs/parquet 58.0.0 / object_store 0.13.0#19728
Upgrade DataFusion to arrow-rs/parquet 58.0.0 / object_store 0.13.0#19728alamb wants to merge 60 commits intoapache:mainfrom
object_store 0.13.0#19728Conversation
| | alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false | | ||
| | alltypes_tiny_pages.parquet | 454233 | 269266 | 2 | page_index=true | | ||
| | lz4_raw_compressed_larger.parquet | 380836 | 1347 | 2 | page_index=false | | ||
| | alltypes_tiny_pages.parquet | 454233 | 269074 | 2 | page_index=true | |
There was a problem hiding this comment.
I think this reduction in metadata size is a direct consequence of @WaterWhisperer's PR to improve PageEncoding representation
|
Run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmarks |
|
run benchmark tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖: Benchmark completed Details
|
object_store 13.0.0object_store 13.0.0
| let timestamp = Utc::now(); | ||
| let range = options.range.clone(); | ||
|
|
||
| let head = options.head; |
There was a problem hiding this comment.
A substantial amount of the changes in this PR are due to the upgrade to object_store 0.13 where several of the trait methods are consolidated (e.g. get, get_opts, head, etc) have been consolidated.
You can see the upgrade guide here: https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html#upgrade-guide-for-0130
|
|
||
| let props = WriterProperties::builder() | ||
| .set_max_row_group_size(2) | ||
| .set_max_row_group_row_count(Some(2)) |
There was a problem hiding this comment.
This configuration was renamed in
| .location | ||
| .parts() | ||
| .last() | ||
| .next_back() |
There was a problem hiding this comment.
clippy told me to do this -- I am not sure why it doesn't do so on main
| _opts: PutOptions, | ||
| ) -> object_store::Result<PutResult> { | ||
| Err(object_store::Error::NotImplemented) | ||
| unimplemented!() |
There was a problem hiding this comment.
the NotImplemented error now contains new fields about operations that are not implemented. I just kept the code simple and used unimplemented instead
| Total Requests: 2 | ||
| - HEAD path=csv_table.csv | ||
| - GET path=csv_table.csv | ||
| - GET (opts) path=csv_table.csv head=true |
There was a problem hiding this comment.
There are many fewer methods in the ObjectStore trait now, and the tests have been updated to reflect that. The actual calls are all still the same
| // no content for head requests | ||
| GetResultPayload::Stream(stream::empty().boxed()) | ||
| } else if let Some(range) = options.range { | ||
| let GetRange::Bounded(range) = range else { |
There was a problem hiding this comment.
I had to inline the implementation of head() and get_range() here
|
|
||
| let stop = if !self.include_upper_bound { | ||
| Date32Type::subtract_month_day_nano(stop, step) | ||
| Date32Type::subtract_month_day_nano_opt(stop, step).ok_or_else(|| { |
There was a problem hiding this comment.
due to apache/arrow-rs#9144 from @cht42 (the underlying library now checks for overflow and returns None rather than panic'ing)
| 07)│ DataSourceExec │ | ||
| 08)│ -------------------- │ | ||
| 09)│ bytes: 1040 │ | ||
| 09)│ bytes: 1024 │ |
There was a problem hiding this comment.
the size of parquet files changes slightly from version to version (as the embedded version changes, etc)
object_store 13.0.0object_store 0.13.0
|
Ok, I think this PR is ready to go |
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark clickbench_partitioned |
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
58.0.0(January 2026) arrow-rs#8466Oustanding issues
Rationale for this change
Keep datafusion up to date (and test Arrow using DataFusion tests)
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?