GH-45201: [C++][Parquet] Improve performance of generating size statistics#45202
GH-45201: [C++][Parquet] Improve performance of generating size statistics#45202pitrou merged 4 commits intoapache:mainfrom
Conversation
|
cc @wgtmac |
|
@ursabot please benchmark lang=C++ |
|
|
e67472b to
8a4ffbc
Compare
|
Oops, looks like the Continuous Benchmarking site is completely broken now. |
|
Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit e67472b946bec531f3ce34067e177534cebfb390. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
| int64_t WriteLevels(int64_t num_values, const int16_t* def_levels, | ||
| int64_t WriteLevels(int64_t num_levels, const int16_t* def_levels, | ||
| const int16_t* rep_levels) { | ||
| // Update histograms now, to maximize cache efficiency. |
There was a problem hiding this comment.
Don't fully understand why scaning a pass of def_levels and rep_levels first would maximize cache efficiency, would WriteDefinitionLevels first also update the L1D cache?
There was a problem hiding this comment.
I'm not sure and I don't see any difference on our mini-benchmark. But WriteDefinitionLevels also writes to a destination buffer, so perhaps some of the levels will be evicted from the CPU cache.
|
Does the benchmark result changed after the new code patch? |
|
Yes, it's even better, see the new numbers in the PR description. |
| void UpdateLevelHistogram(::arrow::util::span<const int16_t> levels, | ||
| ::arrow::util::span<int64_t> histogram) { | ||
| const int64_t num_levels = static_cast<int64_t>(levels.size()); | ||
| const int16_t max_level = static_cast<int16_t>(histogram.size() - 1); |
There was a problem hiding this comment.
(Should we DCHECK_GE(histogram.size(), 1)?)
|
@github-actions crossbow submit -g cpp |
|
Revision: 9f9bc94 Submitted crossbow builds: ursacomputing/crossbow @ actions-f6939022a3 |
|
The Valgrind error is unrelated. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2b5f56c. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…stics (#45202) We found out in #45085 that there is a non-trivial overhead when writing size statistics is enabled. Dramatically reduce overhead by speeding up def/rep levels histogram updates. Performance results on the author's machine: ``` ------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------------------------------------------ BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type> 8103053 ns 8098569 ns 86 bytes_per_second=1003.26Mi/s items_per_second=129.477M/s output_size=537.472k page_index_size=33 BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type> 8153499 ns 8148492 ns 86 bytes_per_second=997.117Mi/s items_per_second=128.683M/s output_size=537.488k page_index_size=33 BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type> 8212560 ns 8207754 ns 83 bytes_per_second=989.918Mi/s items_per_second=127.754M/s output_size=537.502k page_index_size=47 BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType> 10405020 ns 10400775 ns 67 bytes_per_second=444.142Mi/s items_per_second=100.817M/s output_size=848.305k page_index_size=34 BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType> 10464784 ns 10460778 ns 66 bytes_per_second=441.594Mi/s items_per_second=100.239M/s output_size=848.325k page_index_size=34 BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType> 10469832 ns 10465739 ns 67 bytes_per_second=441.385Mi/s items_per_second=100.191M/s output_size=848.344k page_index_size=48 BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type> 13004962 ns 12992678 ns 52 bytes_per_second=657.101Mi/s items_per_second=80.7052M/s output_size=617.464k page_index_size=34 BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type> 13718352 ns 13705599 ns 50 bytes_per_second=622.921Mi/s items_per_second=76.5071M/s output_size=617.486k page_index_size=34 BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type> 13845553 ns 13832138 ns 52 bytes_per_second=617.222Mi/s items_per_second=75.8072M/s output_size=617.506k page_index_size=54 BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType> 15715263 ns 15702707 ns 44 bytes_per_second=320.449Mi/s items_per_second=66.7768M/s output_size=927.326k page_index_size=35 BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType> 16507328 ns 16493800 ns 43 bytes_per_second=305.079Mi/s items_per_second=63.5739M/s output_size=927.352k page_index_size=35 BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType> 16575359 ns 16561311 ns 42 bytes_per_second=303.836Mi/s items_per_second=63.3148M/s output_size=927.377k page_index_size=55 ``` Performance results without this PR: ``` ------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------------------------------------------ BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type> 8042576 ns 8037678 ns 87 bytes_per_second=1010.86Mi/s items_per_second=130.458M/s output_size=537.472k page_index_size=33 BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type> 9576627 ns 9571279 ns 73 bytes_per_second=848.894Mi/s items_per_second=109.554M/s output_size=537.488k page_index_size=33 BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type> 9570204 ns 9563595 ns 73 bytes_per_second=849.576Mi/s items_per_second=109.642M/s output_size=537.502k page_index_size=47 BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType> 10165397 ns 10160868 ns 69 bytes_per_second=454.628Mi/s items_per_second=103.197M/s output_size=848.305k page_index_size=34 BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType> 11662568 ns 11657396 ns 60 bytes_per_second=396.265Mi/s items_per_second=89.9494M/s output_size=848.325k page_index_size=34 BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType> 11657135 ns 11653063 ns 60 bytes_per_second=396.412Mi/s items_per_second=89.9829M/s output_size=848.344k page_index_size=48 BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type> 13182006 ns 13168704 ns 51 bytes_per_second=648.318Mi/s items_per_second=79.6264M/s output_size=617.464k page_index_size=34 BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type> 16438205 ns 16421762 ns 43 bytes_per_second=519.89Mi/s items_per_second=63.8528M/s output_size=617.486k page_index_size=34 BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type> 16424615 ns 16409032 ns 42 bytes_per_second=520.293Mi/s items_per_second=63.9024M/s output_size=617.506k page_index_size=54 BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType> 15387808 ns 15373086 ns 46 bytes_per_second=327.32Mi/s items_per_second=68.2086M/s output_size=927.326k page_index_size=35 BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType> 18319628 ns 18302938 ns 37 bytes_per_second=274.924Mi/s items_per_second=57.29M/s output_size=927.352k page_index_size=35 BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType> 18346665 ns 18329336 ns 37 bytes_per_second=274.528Mi/s items_per_second=57.2075M/s output_size=927.377k page_index_size=55 ``` Tested by existing tests, validated by existing benchmarks. No. * GitHub Issue: #45201 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
We found out in #45085 that there is a non-trivial overhead when writing size statistics is enabled.
What changes are included in this PR?
Dramatically reduce overhead by speeding up def/rep levels histogram updates.
Performance results on the author's machine:
Performance results without this PR:
Are these changes tested?
Tested by existing tests, validated by existing benchmarks.
Are there any user-facing changes?
No.