fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet by andygrove · Pull Request #1782 · apache/datafusion-comet

andygrove · 2025-05-23T16:20:55Z

Which issue does this PR close?

Closes #1574

Rationale for this change

What changes are included in this PR?

How are these changes tested?

andygrove · 2025-05-23T16:46:32Z

@wForget fyi

parthchandra

lgtm

codecov-commenter · 2025-05-23T17:35:21Z

Codecov Report

Attention: Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 58.56%. Comparing base (f09f8af) to head (c592764).
Report is 213 commits behind head on main.

Files with missing lines	Patch %	Lines
...va/org/apache/comet/parquet/NativeBatchReader.java	0.00%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1782      +/-   ##
============================================
+ Coverage     56.12%   58.56%   +2.43%     
- Complexity      976     1140     +164     
============================================
  Files           119      130      +11     
  Lines         11743    12718     +975     
  Branches       2251     2371     +120     
============================================
+ Hits           6591     7448     +857     
- Misses         4012     4079      +67     
- Partials       1140     1191      +51

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mbutrovich · 2025-05-23T18:01:34Z

This PR reads it from the SQLConf, serializes it to the native side, and stashes it in SparkParquetOptions. I don't see what happens with it after that. Presumably we could then use this info in the SchemaAdapter but I don't see that logic anywhere. I think it's just a dead-end config like is_adapting_schema right now.

andygrove · 2025-05-23T18:12:48Z

This PR reads it from the SQLConf, serializes it to the native side, and stashes it in SparkParquetOptions. I don't see what happens with it after that. Presumably we could then use this info in the SchemaAdapter but I don't see that logic anywhere. I think it's just a dead-end config like is_adapting_schema right now.

We do have code that uses this config:

                    if self.parquet_options.case_sensitive {
                        b.name() == field.name()
                    } else {
                        b.name().to_lowercase() == field.name().to_lowercase()
                    }

The newly added test was previously failing for native_datafusion (but not for native_iceberg_compat for some reason).

wForget

LGTM, thanks.

…ache#1782)

andygrove added 3 commits May 23, 2025 10:18

support CASE_SENSITIVE

aad7c98

scalastyle

b85d58d

cleanup

5dae6bb

parthchandra approved these changes May 23, 2025

View reviewed changes

andygrove added 2 commits May 23, 2025 15:26

upmerge

7b0b5f1

fix case sensitivity within structs

c592764

wForget approved these changes May 26, 2025

View reviewed changes

andygrove merged commit de9f425 into apache:main May 27, 2025
69 checks passed

andygrove deleted the case-sensitive-scan branch May 27, 2025 14:07

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

fix: [native_scans] Support CASE_SENSITIVE when reading Parquet (ap…

e61179b

…ache#1782)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet#1782

fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet#1782
andygrove merged 5 commits intoapache:mainfrom
andygrove:case-sensitive-scan

andygrove commented May 23, 2025

Uh oh!

andygrove commented May 23, 2025

Uh oh!

parthchandra left a comment

Uh oh!

codecov-commenter commented May 23, 2025 •

edited

Loading

Uh oh!

mbutrovich commented May 23, 2025

Uh oh!

andygrove commented May 23, 2025

Uh oh!

wForget left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

andygrove commented May 23, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented May 23, 2025

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mbutrovich commented May 23, 2025

Uh oh!

andygrove commented May 23, 2025

Uh oh!

wForget left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented May 23, 2025 •

edited

Loading