Flink: add unit tests for range distribution on bucket partition column#11033
Conversation
| // It takes 2 checkpoint cycle for statistics collection and application | ||
| // of the globally aggregated statistics in the range partitioner. | ||
| // The last two checkpoints should have range shuffle applied |
There was a problem hiding this comment.
How stable is this test?
Do I understand correctly, that relaxed the conditions so the test will never fail if the feature is correct?
Would this test fail on a slow machine (like the CI) with the feature turned off?
There was a problem hiding this comment.
yes, the relaxed condition is from maxAddedDataFilesPerCheckpoint as NUM_BUCKETS + parallelism, which would be guaranteed by the range partition. In some cases, it can be smaller than that as NUM_BUCKETS or parallelism for divisible scenarios.
this test is guaranteed to fail without range partition, as each writer subtask can write NUM_BUCKETS of files. the total number of data files per commit can get up to NUM_BUCKETS * parallelism.
|
It looks the new UT is flaky https://github.com/apache/iceberg/actions/runs/10825717894/job/30035219384 |
* main: (208 commits) Docs: Fix Flink 1.20 support versions (apache#11065) Flink: Fix compile warning (apache#11072) Docs: Initial committer guidelines and requirements for merging (apache#10780) Core: Refactor ZOrderByteUtils (apache#10624) API: implement types timestamp_ns and timestamptz_ns (apache#9008) Build: Bump com.google.errorprone:error_prone_annotations (apache#11055) Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062) Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018) Kafka Connect: Disable publish tasks in runtime project (apache#11032) Flink: add unit tests for range distribution on bucket partition column (apache#11033) Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027) Core: Add benchmark for appending files (apache#11029) Build: Ignore benchmark output folders across all modules (apache#11030) Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846) Docs: bump latest version to 1.6.1 (apache#11036) OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024) Core: Generate realistic bounds in benchmarks (apache#11022) Add REST Compatibility Kit (apache#10908) Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009) Docs: Add Druid docs url to sidebar (apache#10997) ...
…mn (apache#11033) (cherry picked from commit 4b71d40)
Also started to use the new
DataGeneratorSourcewhich is only available in 1.19 and after. hence, didn't add the unit test to 1.18.