chore: Add scanImpl attribute to CometScanExec#1746
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1746 +/- ##
============================================
+ Coverage 56.12% 58.60% +2.47%
- Complexity 976 1131 +155
============================================
Files 119 130 +11
Lines 11743 12663 +920
Branches 2251 2362 +111
============================================
+ Hits 6591 7421 +830
- Misses 4012 4058 +46
- Partials 1140 1184 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
scanImpl attribute to CometScanExec
| _.files.forall(_.urlEncodedPath.contains("p=0")))) | ||
| + case WholeStageCodegenExec(ColumnarToRowExec(InputAdapter( | ||
| + fs @ CometScanExec(_, _, _, partitionFilters, _, _, _, _, _, _)))) => | ||
| + fs @ CometScanExec(_, _, _, _, partitionFilters, _, _, _, _, _, _)))) => |
There was a problem hiding this comment.
CometScanExec has an extra field (scanImpl).
It would have been better to update the syntax here to remove this "variable binding" style of matching to pull out partitionFilters since we can already access it as fs.partitionFilters. However, this would have required more work to run scalastyle and regenerate diffs for all Spark versions, so I was lazy and just added an extra _.
| def containsNativeCometScan(plan: SparkPlan): Boolean = { | ||
| plan match { | ||
| case w: CometScanWrapper => containsNativeCometScan(w.originalPlan) | ||
| case scan: CometScanExec => scan.scanImpl == CometConf.SCAN_NATIVE_COMET |
There was a problem hiding this comment.
Do we need CometConf.COMET_NATIVE_SCAN_IMPL.get() == CometConf.SCAN_NATIVE_ICEBERG_COMPAT as well?
NATIVE_ICEBERG_COMPAT does not have any buffer reuse so I think this change is correct, but would just like to confirm that it does not affect Filter.
There was a problem hiding this comment.
We only need to use Comet's FilterExec if any of the scans are NATIVE_COMET due to the buffer re-use in that scan.
Which issue does this PR close?
Follows on from #1744
Rationale for this change
This PR adds a
scanImplattribute toCometScanExecand this is populated at planning time inCometScanRulebased onCometConf.COMET_NATIVE_SCAN_IMPL. All other references toCometConf.COMET_NATIVE_SCAN_IMPLhave been removed from the implementation code.This is a step toward supporting an "auto" mode in
CometScanRuleso that we can pick the best scan implementation and potentially have multiple scan implementations active within the same query.What changes are included in this PR?
Add
scanImplattribute toCometScanExecand propagate this to other code.How are these changes tested?
Existing tests.