Skip to content

Core: Fix query failure when using projection on top of partitions metadata table#4720

Merged
RussellSpitzer merged 2 commits intoapache:masterfrom
singhpk234:fix/ICEBERG-4718
May 9, 2022
Merged

Core: Fix query failure when using projection on top of partitions metadata table#4720
RussellSpitzer merged 2 commits intoapache:masterfrom
singhpk234:fix/ICEBERG-4718

Conversation

@singhpk234
Copy link
Contributor

Closes #4718

As per my understanding, we should use scan.tableSchema() rather than scan.schema() for transformSpec when planFiles() for partitions metaData table.

Hence we faced :

22/05/06 16:12:25 ERROR SparkSQLDriver: Failed in [select file_count from spark_catalog.monitoring.test.partitions]
java.lang.IllegalArgumentException: Cannot find source column: partition.date

as present in reported ticket

Though in master we will get :

Cannot find source column: 1000
java.lang.NullPointerException: Cannot find source column: 1000


Testing Done :
Added UT fails without the change

cc @szehon-ho @RussellSpitzer

@singhpk234 singhpk234 changed the title [Core] : Fix query failure when using projection on top of partitions table Core: Fix query failure when using projection on top of partitions table May 7, 2022
@singhpk234 singhpk234 changed the title Core: Fix query failure when using projection on top of partitions table Core: Fix query failure when using projection on top of partitions metadata table May 7, 2022
LoadingCache<Integer, ManifestEvaluator> evalCache = Caffeine.newBuilder().build(specId -> {
PartitionSpec spec = table.specs().get(specId);
PartitionSpec transformedSpec = transformSpec(scan.schema(), spec);
PartitionSpec transformedSpec = transformSpec(scan.tableSchema(), spec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case "Schema" is the projected schema which is missing all the partition columns and "tableSchema" has the full partitionSpec. LGTM

Assert.assertEquals("Metadata table should return one data file", 1, actualDataFiles.size());
TestHelpers.assertEqualsSafe(filesTableSchema.asStruct(), expectedDataFiles.get(0), actualDataFiles.get(0));

List<Row> actualPartitionsWithProjection =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 added a UT for the same, thanks @RussellSpitzer !!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Otherwise we wouldn't have an automated test running on this with "Core" only changes.

@abmo-x
Copy link
Contributor

abmo-x commented May 7, 2022

Thanks @singhpk234 for a quick turn around, appreciate it!

@RussellSpitzer RussellSpitzer merged commit 0390e17 into apache:master May 9, 2022
@RussellSpitzer
Copy link
Member

Thanks @singhpk234 for the PR and @abmo-x for review!

@szehon-ho
Copy link
Member

Arg, sorry about the initial bug, thanks @singhpk234 for the fix and all for review

szehon-ho added a commit to szehon-ho/iceberg that referenced this pull request May 28, 2022
rdblue pushed a commit that referenced this pull request May 29, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg 0.13 with Spark 3.2 - list partitions query always need partition.date and partition.hour columns in the result

4 participants