Skip to content

fix: [native_scans] Support CASE_SENSITIVE when reading Parquet#1782

Merged
andygrove merged 5 commits intoapache:mainfrom
andygrove:case-sensitive-scan
May 27, 2025
Merged

fix: [native_scans] Support CASE_SENSITIVE when reading Parquet#1782
andygrove merged 5 commits intoapache:mainfrom
andygrove:case-sensitive-scan

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #1574

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@andygrove
Copy link
Member Author

@wForget fyi

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2025

Codecov Report

Attention: Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 58.56%. Comparing base (f09f8af) to head (c592764).
Report is 213 commits behind head on main.

Files with missing lines Patch % Lines
...va/org/apache/comet/parquet/NativeBatchReader.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1782      +/-   ##
============================================
+ Coverage     56.12%   58.56%   +2.43%     
- Complexity      976     1140     +164     
============================================
  Files           119      130      +11     
  Lines         11743    12718     +975     
  Branches       2251     2371     +120     
============================================
+ Hits           6591     7448     +857     
- Misses         4012     4079      +67     
- Partials       1140     1191      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich
Copy link
Contributor

This PR reads it from the SQLConf, serializes it to the native side, and stashes it in SparkParquetOptions. I don't see what happens with it after that. Presumably we could then use this info in the SchemaAdapter but I don't see that logic anywhere. I think it's just a dead-end config like is_adapting_schema right now.

@andygrove
Copy link
Member Author

This PR reads it from the SQLConf, serializes it to the native side, and stashes it in SparkParquetOptions. I don't see what happens with it after that. Presumably we could then use this info in the SchemaAdapter but I don't see that logic anywhere. I think it's just a dead-end config like is_adapting_schema right now.

We do have code that uses this config:

                    if self.parquet_options.case_sensitive {
                        b.name() == field.name()
                    } else {
                        b.name().to_lowercase() == field.name().to_lowercase()
                    }

The newly added test was previously failing for native_datafusion (but not for native_iceberg_compat for some reason).

Copy link
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@andygrove andygrove merged commit de9f425 into apache:main May 27, 2025
69 checks passed
@andygrove andygrove deleted the case-sensitive-scan branch May 27, 2025 14:07
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native_datafusion/native_iceberg_compat scans case sensitive

5 participants