chore: Prepare for DataFusion 48.0.0#1710
Conversation
|
failures are due to |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1710 +/- ##
============================================
+ Coverage 56.12% 58.60% +2.47%
- Complexity 976 1138 +162
============================================
Files 119 130 +11
Lines 11743 12663 +920
Branches 2251 2362 +111
============================================
+ Hits 6591 7421 +830
- Misses 4012 4058 +46
- Partials 1140 1184 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
native/core/src/execution/planner.rs
Outdated
| fun_expr, | ||
| vec![left, right], | ||
| data_type, | ||
| Field::new("foo", data_type, true), |
There was a problem hiding this comment.
There have been changes in DF to use Field rather than DataType. Comet does not use the field name.
There was a problem hiding this comment.
can we use name not_used or to put the comment like that field name is not used in Comet?
There was a problem hiding this comment.
I think the type is still used, so what about leaving a small comment instead?
There was a problem hiding this comment.
I updated this to use the function name as the return field name
| let arg_fields = coerced_types | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(i, dt)| Field::new(format!("arg{i}"), dt.clone(), true)) |
There was a problem hiding this comment.
We now have to provide Field instead of DataType
| if let (Some(filter), Some(data_schema)) = (cnf_data_filters, &data_schema) { | ||
| parquet_source = parquet_source.with_predicate(Arc::clone(data_schema), filter); | ||
| if let Some(filter) = cnf_data_filters { | ||
| parquet_source = parquet_source.with_predicate(filter); |
There was a problem hiding this comment.
There were changes to the predicate push-down API
| @@ -148,7 +148,7 @@ impl ExecutionPlan for CopyExec { | |||
| } | |||
|
|
|||
| fn statistics(&self) -> DataFusionResult<Statistics> { | |||
There was a problem hiding this comment.
should we also rename it to partition_statistics ?
|
|
||
| fn statistics(&self) -> Result<Statistics> { | ||
| self.input.statistics() | ||
| self.input.partition_statistics(None) |
There was a problem hiding this comment.
rename to partition_statistics ?
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove just some minor comments
native/Cargo.toml
Outdated
| arrow = { version = "55.1.0", features = ["prettyprint", "ffi", "chrono-tz"] } | ||
| async-trait = { version = "0.1" } | ||
| bytes = { version = "1.10.0" } | ||
| parquet = { version = "55.0.0", default-features = false, features = ["experimental"] } |
There was a problem hiding this comment.
parquet should be 55.1.0 ?
| &self, | ||
| _file_col_statistics: &[ColumnStatistics], | ||
| ) -> datafusion::common::Result<Vec<ColumnStatistics>> { | ||
| Ok(vec![]) |
There was a problem hiding this comment.
is this unused or just placeholder...?
There was a problem hiding this comment.
Column statistics are not used in Comet
|
Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. The perf improvement also addresses the unsigned int issue which will allow us to clean up a lot of code/tests. |
DataFusion main branch already uses arrow-rs 55.1.0: https://github.com/apache/datafusion/blob/963b649d1e021e2516c2d652a8aa9fc486a5a1ba/Cargo.toml#L91 |
Which issue does this PR close?
Closes #1709
Rationale for this change
Keep up-to-date with upstream changes.
What changes are included in this PR?
How are these changes tested?