Skip to content

fix: support map_keys#1788

Merged
comphead merged 4 commits intoapache:mainfrom
comphead:dev
May 27, 2025
Merged

fix: support map_keys#1788
comphead merged 4 commits intoapache:mainfrom
comphead:dev

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Closes #1781.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@comphead comphead changed the title fix: support map_values, map_keys fix: support map_values, map_keys May 24, 2025
@codecov-commenter
Copy link

codecov-commenter commented May 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.46%. Comparing base (f09f8af) to head (711feda).
Report is 217 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1788      +/-   ##
============================================
+ Coverage     56.12%   59.46%   +3.34%     
- Complexity      976     1140     +164     
============================================
  Files           119      128       +9     
  Lines         11743    12531     +788     
  Branches       2251     2356     +105     
============================================
+ Hits           6591     7452     +861     
+ Misses         4012     3889     -123     
- Partials       1140     1190      +50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead comphead changed the title fix: support map_values, map_keys fix: support map_keys May 25, 2025
@comphead
Copy link
Contributor Author

map_values will be addressed in following PR. It got a correctness issue

- read map[struct, struct] from parquet *** FAILED *** (1 second, 478 milliseconds)
  Results do not match for query:
  Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
  Timezone Env: 
  
  == Parsed Logical Plan ==
  Project [map_values(map1#508970).id2 AS map_values(map1).id2#509026]
  +- SubqueryAlias tbl
     +- View (`tbl`, [map1#508970])
        +- Relation [map1#508970] parquet
  
  == Analyzed Logical Plan ==
  map_values(map1).id2: array<bigint>
  Project [map_values(map1#508970).id2 AS map_values(map1).id2#509026]
  +- SubqueryAlias tbl
     +- View (`tbl`, [map1#508970])
        +- Relation [map1#508970] parquet
  
  == Optimized Logical Plan ==
  Project [map_values(map1#508970).id2 AS map_values(map1).id2#509026]
  +- Relation [map1#508970] parquet
  
  == Physical Plan ==
  *(1) CometColumnarToRow
  +- CometProject [map_values(map1).id2#509026], [map_values(map1#508970).id2 AS map_values(map1).id2#509026]
     +- CometScan parquet [map1#508970] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/__w/datafusion-comet/datafusion-comet/spark/target/tmp/spark-062..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<map1:map<struct<id:bigint,id2:bigint,id3:bigint>,struct<id2:bigint>>>
  
  == Results ==
  
  == Results ==
  !== Correct Answer - 5 ==                     == Spark Answer - 5 ==
   struct<map_values(map1).id2:array<bigint>>   struct<map_values(map1).id2:array<bigint>>
  ![ArrayBuffer(3)]                             [ArrayBuffer(0)]
  ![ArrayBuffer(4)]                             [ArrayBuffer(3)]
  ![ArrayBuffer(null)]                          [ArrayBuffer(4)]
   [null]                                       [null]
   [null]                                       [null] (QueryTest.scala:244)

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. (Some minor comments)

let session_ctx = SessionContext::new();

// generate test data in the temp folder
let test_data = "select map([named_struct('a', 1, 'b', 'n', 'c', 'x')], [named_struct('a', 1, 'b', 'n', 'c', 'x')]) c0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can we use a different struct for the key and the value, just to make it clear we are reading the correct fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

sqlConf: Seq[(String, String)] = Seq.empty,
debugCometDF: DataFrame => Unit = _ => ()): Unit = {
debugCometDF: DataFrame => Unit = _ => (),
checkCometOperator: Boolean = true): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not related directly, but this extends testing method functionality little bit to support tests which doesn't require all native comet operators. Used this extension to find that map_keys projection is not supported

select map(named_struct('a', 1, 'b', 'n', 'c', 'x'), named_struct('a', 1, 'b', 'n', 'c', 'x')) m
*/
#[tokio::test]
async fn test_nested_types_map_keys_by_index() -> Result<(), DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what does the by_index part of the name refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"select map_keys(c0).b from tbl")
}

// test("native reader - select nested field from a complex map key using map_values") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be ignored with a link to an issue rather than commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead

@comphead
Copy link
Contributor Author

Thanks everyone

@comphead comphead merged commit 38beead into apache:main May 27, 2025
68 checks passed
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
* fix: support `map_keys`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make map_keys work with projection

5 participants