Skip to content

fix: cast map types correctly in schema adapter#1771

Merged
andygrove merged 1 commit intoapache:mainfrom
parthchandra:complex-sql-tests-1754
May 23, 2025
Merged

fix: cast map types correctly in schema adapter#1771
andygrove merged 1 commit intoapache:mainfrom
parthchandra:complex-sql-tests-1754

Conversation

@parthchandra
Copy link
Contributor

@parthchandra parthchandra commented May 21, 2025

Which issue does this PR close?

Closes #1754

Rationale for this change

In schema_adapter Map types are cast using arrow's cast which assumes that all nested fields within the map are also castable. However, the check to verify if the fields can be cast does not take into account nested struct fields with a different number of fields.

What changes are included in this PR?

The PR changes the cast method to a copy of the original that calls the schema adapter's struct cast method which can handle such structs

How are these changes tested?

Unit test was already added. This PR enables the test

@parthchandra parthchandra marked this pull request as draft May 21, 2025 22:40
@parthchandra
Copy link
Contributor Author

Marking as draft. Spark diffs may need to be updated as well.

@codecov-commenter
Copy link

codecov-commenter commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.59%. Comparing base (f09f8af) to head (f22c55e).
Report is 207 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1771      +/-   ##
============================================
+ Coverage     56.12%   58.59%   +2.46%     
- Complexity      976     1131     +155     
============================================
  Files           119      130      +11     
  Lines         11743    12667     +924     
  Branches       2251     2363     +112     
============================================
+ Hits           6591     7422     +831     
- Misses         4012     4061      +49     
- Partials       1140     1184      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra parthchandra marked this pull request as ready for review May 22, 2025 16:30
@parthchandra
Copy link
Contributor Author

This is ready for review. Fixes 16 test failures in core2 (both native_datafusion and native_iceberg_compat) without any regressions

@andygrove andygrove requested a review from comphead May 22, 2025 17:09
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthchandra!


/// Cast a map type to another map type. The same as arrow-cast except we recursively call our own
/// cast_array
pub(crate) fn cast_map_values(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need pub(crate)?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthchandra

however I feel we need to add some more later. the test passed because

          checkSparkAnswer(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

and will prob fail for

          checkSparkAnswerAndOperator(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

@comphead
Copy link
Contributor

in followup PR I'll add a rust test case for map

@parthchandra parthchandra force-pushed the complex-sql-tests-1754 branch from 8eb843a to f22c55e Compare May 22, 2025 23:01
@parthchandra
Copy link
Contributor Author

Thanks @parthchandra

however I feel we need to add some more later. the test passed because

          checkSparkAnswer(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

and will prob fail for

          checkSparkAnswerAndOperator(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

Good point @comphead .

The test cannot use checkSparkAnswerAndOperator atm but should eventually.
The plan is not fully native for two reasons -
native_datafusion and native_iceberg_compat do not support DSV2 so the Scan falls back to Spark
with DSV2 the plan is

plan: *(1) Project [map_keys(map1#12).id2 AS map_keys(map1).id2#54]
+- *(1) ColumnarToRow
   +- BatchScan parquet file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-e0fd616e-7fdc-4d0a-be8a-e6453797e243[map1#12] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-e0..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<map1:map<struct<id2:bigint>,struct<id:bigint,id2:bigint,id3:bigint>>> RuntimeFilters: []

with DSV1 sources, the scan is Native but the Project is not.

plan: *(1) Project [map_keys(map1#82).id2 AS map_keys(map1).id2#118]
+- *(1) CometColumnarToRow
   +- CometNativeScan parquet [map1#82] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d0..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<map1:map<struct<id2:bigint>,struct<id:bigint,id2:bigint,id3:bigint>>>

@comphead
Copy link
Contributor

Filed #1781

@andygrove andygrove merged commit 969c013 into apache:main May 23, 2025
83 checks passed
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
@parthchandra parthchandra deleted the complex-sql-tests-1754 branch January 14, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys"

4 participants