feat: Hook DataFusion Parquet native scan with Comet execution#1094
feat: Hook DataFusion Parquet native scan with Comet execution#1094viirya merged 6 commits intoapache:comet-parquet-execfrom
Conversation
|
Basic query is okay now. But this doesn't completely work for all queries right now. It still gets a few test failures in CometExecSuite. I'm looking into that. |
| |""".stripMargin | ||
| } | ||
|
|
||
| lazy val inputRDD: RDD[InternalRow] = { |
There was a problem hiding this comment.
Shouldn't we return an rdd here (perhaps CometExecRDD) because in CometNativeExec we will call executeColumnar which will use this?
There was a problem hiding this comment.
No, we won't call executeColumnar on CometNativeScanExec anymore. If you run it, it runs the original JVM frontend instead of native scan. It is similar to other native operators.
native/core/src/execution/jni_api.rs
Outdated
| let num_rows = output_batch.num_rows(); | ||
|
|
||
| if results.len() != num_cols { | ||
| if results.len() < num_cols { |
There was a problem hiding this comment.
Why can we now get more output columns (i.e., results.len() > num_cols) than before?
There was a problem hiding this comment.
A failed query has no output on the scan. I.e., the query asks for empty column. But the native scan still outputs all columns. It should be more reasonable to output no column, actually. Let me see if it could be fixed at the native scan.
There was a problem hiding this comment.
I changed this back after pushing empty projection to the native scan.
|
A few of test failures in CometExecSuite are due to partition values are not handled for now. It requires some more changes to this branch. I will go to work on it in other PR. |
|
cc @andygrove |
|
Thanks @mbutrovich |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?