Skip to content

chore: Enable CI checks for native_datafusion scan#1479

Merged
andygrove merged 6 commits intoapache:mainfrom
andygrove:native-datafusion-ci
Mar 7, 2025
Merged

chore: Enable CI checks for native_datafusion scan#1479
andygrove merged 6 commits intoapache:mainfrom
andygrove:native-datafusion-ci

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Mar 6, 2025

Which issue does this PR close?

Part of #1486

Rationale for this change

To prevent regressions and track progress with supporting native_datafusion, I would like to start enabling CI runs with this scan as the default.

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 58.77%. Comparing base (f09f8af) to head (61a3071).
Report is 63 commits behind head on main.

Files with missing lines Patch % Lines
...on/src/main/scala/org/apache/comet/CometConf.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1479      +/-   ##
============================================
+ Coverage     56.12%   58.77%   +2.64%     
- Complexity      976     1025      +49     
============================================
  Files           119      122       +3     
  Lines         11743    12279     +536     
  Branches       2251     2311      +60     
============================================
+ Hits           6591     7217     +626     
+ Misses         4012     3904     -108     
- Partials       1140     1158      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review March 6, 2025 18:41
@andygrove
Copy link
Member Author

@mbutrovich could you review when you have time


linux-test-native-datafusion-scan:
env:
COMET_PARQUET_SCAN_IMPL: "native_datafusion"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job is a copy of the regular job but enabling this env var

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need spark sql tests as well

Comment on lines +87 to +88
spark-version: ['3.5']
scala-version: ['2.12']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other versions? Spark 3.4, 4.0, Scala 2.13?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this would be enough for now to prevent regressions while this is still an experimental feature.

We could increase the matrix here but it will add overhead to all PRs. What do you think @mbutrovich?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested Spark 3.5, Spark 4.0, Scala 2.13 with native_datafusion and would prefer not to get spread too thin if we get a deluge of issues. I propose walking before we run and get this pipeline in great shape and then add more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuyukitanimura Could we merge this PR to add these initial checks. We currently have almost zero tests running in CI for native_datafusion, so this is a big step forward. It is just the first step though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, would you mind filing an issue to track this? I will go ahead and approve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I filed #1486

@andygrove
Copy link
Member Author

I think we will need spark sql tests as well

Eventually, yes. I don't think we can support that yet, though.

@andygrove andygrove merged commit ce5de44 into apache:main Mar 7, 2025
78 checks passed
@andygrove andygrove deleted the native-datafusion-ci branch March 7, 2025 23:41
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants