Spark: Use correct statistics file in SparkScan::estimateStatistics(Snapshot)#12482
Merged
findepi merged 1 commit intoapache:mainfrom Mar 20, 2025
Merged
Spark: Use correct statistics file in SparkScan::estimateStatistics(Snapshot)#12482findepi merged 1 commit intoapache:mainfrom
findepi merged 1 commit intoapache:mainfrom
Conversation
Table::statisticsFiles() returns a List. We need to get the StatisticsFile with the snapshotId of the Snapshot.
wypoon
commented
Mar 8, 2025
|
|
||
| Map<String, Long> expectedNDV = Maps.newHashMap(); | ||
| expectedNDV.put("id", 6L); | ||
| withSQLConf(reportColStatsEnabled, () -> checkColStatisticsReported(scan, 6L, expectedNDV)); |
Contributor
Author
There was a problem hiding this comment.
The test is parameterized with three parameters. When run on its own without the fix, from one to three cases will fail. The reason some of the time the correct StatisticsFile appears as the first in the List is that when TableMetadata is built (
List is built from a Map and the order of the entries depend on the hashing of the snapshotId (which is random).
Contributor
Author
|
@huaxingao can you please review this? |
Contributor
Author
|
@findepi can you please review this simple fix? |
findepi
approved these changes
Mar 20, 2025
Contributor
Author
|
Thanks @findepi! |
This was referenced Mar 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes a bug in
SparkScan::estimateStatistics(Snapshot).Table::statisticsFiles()returns aList<StatisticsFile>. We need to get theStatisticsFilewith thesnapshotIdof theSnapshotfor use in estimating the statistics.I modified an existing test so that it fails without the fix and passes with it.