[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing#30118
[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing#30118MaxGekk wants to merge 3 commits intoapache:masterfrom
Conversation
|
@HyukjinKwon @cloud-fan @tomvanbussel @ala @mswit-databricks @bart-samwel Please, review this PR. |
| after 1900, rebase LEGACY 27305 27305 0 3.7 273.0 0.1X | ||
| after 1900, rebase CORRECTED 27715 27715 0 3.6 277.2 0.1X | ||
| before 1900, rebase LEGACY 30911 30911 0 3.2 309.1 0.1X | ||
| before 1900, rebase CORRECTED 27944 27944 0 3.6 279.4 0.1X |
There was a problem hiding this comment.
Parquet writer without rebasing is ~10% faster.
| before 1900, vec off, rebase LEGACY 20371 20458 81 4.9 203.7 0.8X | ||
| before 1900, vec off, rebase CORRECTED 17484 17541 54 5.7 174.8 1.0X | ||
| before 1900, vec on, rebase LEGACY 10284 10327 45 9.7 102.8 1.6X | ||
| before 1900, vec on, rebase CORRECTED 7044 7073 37 14.2 70.4 2.4X |
There was a problem hiding this comment.
Vectorized Reader speed up: ~30%
| after 1900, vec on, rebase LEGACY 7183 7255 94 13.9 71.8 2.3X | ||
| after 1900, vec on, rebase CORRECTED 7047 7137 86 14.2 70.5 2.4X | ||
| before 1900, vec off, rebase LEGACY 20371 20458 81 4.9 203.7 0.8X | ||
| before 1900, vec off, rebase CORRECTED 17484 17541 54 5.7 174.8 1.0X |
There was a problem hiding this comment.
Parquet-MR speed up ~15%
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #130090 has finished for PR 30118 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
spark.sql.legacy.parquet.int96RebaseModeInWritewhich was added by [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing #30056 inDateTimeRebaseBenchmark. The parquet readers should infer correct rebasing mode automatically from metadata.DateTimeRebaseBenchmarkin the environment:sudo add-apt-repository ppa:openjdk-r/ppa&sudo apt install openjdk-11-jdkWhy are the changes needed?
To have up-to-date info about INT96 performance which is the default type for Catalyst's timestamp type.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By updating benchmark results: