Skip to content

Iceberg 0.13 with Spark 3.2 - list partitions query always need partition.date and partition.hour columns in the result #4718

@abmo-x

Description

@abmo-x

Steps to reproduce:

  1. create a Iceberg table using Hive metastore with date, hour partition columns
  2. Insert few records
  3. try to list partitions for the table created above.

Select queries only succeed if date AND hour columns are used. If either column is skipped or both are skipped then query fails.

This is not reproducible with spark 3.0 and Iceberg 0.11

see queries below:

**spark-sql>** select partition from spark_catalog.monitoring.test.partitions;
{"date":"2020-01-02","hour":"10"}
{"date":"2020-01-02","hour":"00"}
{"date":"2020-01-01","hour":"00"}
{"date":"2020-01-01","hour":"10"}
Time taken: 0.191 seconds, Fetched 4 row(s)


**spark-sql>** select partition.date, partition.hour from spark_catalog.monitoring.test.partitions;
2020-01-02	10
2020-01-02	00
2020-01-01	00
2020-01-01	10
Time taken: 0.198 seconds, Fetched 4 row(s)

**spark-sql>** select file_count from spark_catalog.monitoring.test.partitions; 
22/05/06 16:12:25 ERROR SparkSQLDriver: Failed in [select file_count from spark_catalog.monitoring.test.partitions]
java.lang.IllegalArgumentException: Cannot find source column: partition.date
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217) ~[ ]
	at org.apache.iceberg.PartitionSpec$Builder.findSourceColumn(PartitionSpec.java:374) ~[ ]
	at org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:379) ~[ ]
	at org.apache.iceberg.BaseMetadataTable.lambda$transformSpec$0(BaseMetadataTable.java:68) ~[ ]
	at org.apache.iceberg.relocated.com.google.common.collect.ImmutableList.forEach(ImmutableList.java:405) ~[ ]
	at org.apache.iceberg.BaseMetadataTable.transformSpec(BaseMetadataTable.java:68) ~[ ]
	at org.apache.iceberg.PartitionsTable.planFiles(PartitionsTable.java:114) ~[ ]
	at org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:97) ~[ ]
	at org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:75) ~[ ]
	at org.apache.iceberg.PartitionsTable.access$300(PartitionsTable.java:35) ~[ ]
	at org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:138) ~[ ]
	at org.apache.iceberg.StaticTableScan.planFiles(StaticTableScan.java:66) ~[ ]
	at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:209) ~[ ]
	at org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:179) ~[ ]
	at org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:193) ~[ ]
	at org.apache.iceberg.spark.source.SparkBatchScan.planInputPartitions(SparkBatchScan.java:144) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions$lzycompute(BatchScanExec.scala:52) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions(BatchScanExec.scala:52) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:93) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:92) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:35) ~[ ]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:124) ~[ ]
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) ~[ ]
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.15.jar:?]
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.15.jar:?]
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) ~[ ]
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:68) ~[ ]
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) ~[ ]
	at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) ~[scala-library-2.12.15.jar:?]
	at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) ~[scala-library-2.12.15.jar:?]
	at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
	at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
	at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) ~[scala-library-2.12.15.jar:?]
	at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) ~[scala-library-2.12.15.jar:?]
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) ~[ ]
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.15.jar:?]
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) ~[ ]
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:68) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:470) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$2(QueryExecution.scala:161) ~[ ]
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:200) ~[ ]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:200) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:161) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:154) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:154) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:174) ~[ ]
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:200) ~[ ]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:200) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:174) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:167) ~[ ]
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:167) ~[ ]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101) ~[ ]
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) ~[ ]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) ~[ ]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) ~[ ]
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[ ]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) ~[ ]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:383) ~[ ]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:503) ~[ ]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:497) ~[ ]
	at scala.collection.Iterator.foreach(Iterator.scala:943) [scala-library-2.12.15.jar:?]
	at scala.collection.Iterator.foreach$(Iterator.scala:943) [scala-library-2.12.15.jar:?]
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) [scala-library-2.12.15.jar:?]
	at scala.collection.IterableLike.foreach(IterableLike.scala:74) [scala-library-2.12.15.jar:?]
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73) [scala-library-2.12.15.jar:?]
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56) [scala-library-2.12.15.jar:?]```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions