Skip to content

Make map_keys work with projection #1781

@comphead

Description

@comphead

however I feel we need to add some more later. the test passed because

          checkSparkAnswer(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

and will prob fail for

          checkSparkAnswerAndOperator(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))

Good point @comphead .

The test cannot use checkSparkAnswerAndOperator atm but should eventually.
The plan is not fully native for two reasons -
native_datafusion and native_iceberg_compat do not support DSV2 so the Scan falls back to Spark
with DSV2 the plan is

plan: *(1) Project [map_keys(map1#12).id2 AS map_keys(map1).id2#54]
+- *(1) ColumnarToRow
   +- BatchScan parquet file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-e0fd616e-7fdc-4d0a-be8a-e6453797e243[map1#12] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-e0..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<map1:map<struct<id2:bigint>,struct<id:bigint,id2:bigint,id3:bigint>>> RuntimeFilters: []

with DSV1 sources, the scan is Native but the Project is not.

plan: *(1) Project [map_keys(map1#82).id2 AS map_keys(map1).id2#118]
+- *(1) CometColumnarToRow
   +- CometNativeScan parquet [map1#82] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d0..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<map1:map<struct<id2:bigint>,struct<id:bigint,id2:bigint,id3:bigint>>>

Originally posted by @parthchandra in #1771 (comment)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions