Skip to content

GH-45227: [C++][Parquet] Enable Size Stats and Page Index by default#45249

Merged
wgtmac merged 2 commits intoapache:mainfrom
wgtmac:size_bench_page_index
Jan 21, 2025
Merged

GH-45227: [C++][Parquet] Enable Size Stats and Page Index by default#45249
wgtmac merged 2 commits intoapache:mainfrom
wgtmac:size_bench_page_index

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Jan 14, 2025

Rationale for this change

Benchmark data shows that enabling page index and size stats by default does not have significant penalty.

What changes are included in this PR?

Enable the parquet writer to generate page index and size stats by default.

Are these changes tested?

Pass CIs.

Are there any user-facing changes?

No.

@wgtmac wgtmac marked this pull request as ready for review January 14, 2025 02:23
@github-actions
Copy link

⚠️ GitHub issue #45227 has been automatically assigned in GitHub to PR creator.

@wgtmac
Copy link
Member Author

wgtmac commented Jan 14, 2025

@pitrou @mapleFU Would you please take a look?

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me but I don't know what other think about this, do we report need this in maillist?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 14, 2025
@pitrou
Copy link
Member

pitrou commented Jan 14, 2025

Looks ok to me but I don't know what other think about this, do we report need this in maillist?

We can probably mention it on the dev@parquet ML?

@wgtmac
Copy link
Member Author

wgtmac commented Jan 14, 2025

For the record, the benchmark data on my dev box:

Run on (32 X 2700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 1280 KiB (x16)
  L3 Unified 49152 KiB (x1)
Load Average: 1.01, 1.06, 0.60
------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, false>                  9386745 ns      9386625 ns           74 bytes_per_second=865.593Mi/s items_per_second=111.71M/s output_size=546.08k page_index_size=0
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, true>                   9387974 ns      9387716 ns           75 bytes_per_second=865.493Mi/s items_per_second=111.697M/s output_size=546.091k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, true>            9392147 ns      9391889 ns           74 bytes_per_second=865.108Mi/s items_per_second=111.647M/s output_size=546.107k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, true>     9363539 ns      9363364 ns           75 bytes_per_second=867.744Mi/s items_per_second=111.987M/s output_size=546.121k page_index_size=47
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType, false>                12773243 ns     12772995 ns           55 bytes_per_second=362.473Mi/s items_per_second=82.0932M/s output_size=864.052k page_index_size=0
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType, true>                 12777189 ns     12776940 ns           55 bytes_per_second=362.361Mi/s items_per_second=82.0678M/s output_size=864.083k page_index_size=30
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, true>          12826700 ns     12826328 ns           54 bytes_per_second=360.965Mi/s items_per_second=81.7518M/s output_size=864.103k page_index_size=30
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, true>   12837516 ns     12837345 ns           54 bytes_per_second=360.656Mi/s items_per_second=81.6817M/s output_size=864.122k page_index_size=44
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, false>                      16068608 ns     16068385 ns           43 bytes_per_second=531.323Mi/s items_per_second=65.2571M/s output_size=625.904k page_index_size=0
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, true>                       16026976 ns     16026766 ns           44 bytes_per_second=532.703Mi/s items_per_second=65.4266M/s output_size=625.915k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, true>                16723670 ns     16723159 ns           42 bytes_per_second=510.52Mi/s items_per_second=62.702M/s output_size=625.937k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, true>         16784041 ns     16783691 ns           41 bytes_per_second=508.678Mi/s items_per_second=62.4759M/s output_size=625.957k page_index_size=54
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType, false>                     19524874 ns     19524500 ns           36 bytes_per_second=258.258Mi/s items_per_second=53.7057M/s output_size=944.092k page_index_size=0
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType, true>                      19822498 ns     19822084 ns           35 bytes_per_second=254.381Mi/s items_per_second=52.8994M/s output_size=944.123k page_index_size=31
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, true>               20485041 ns     20484509 ns           34 bytes_per_second=246.155Mi/s items_per_second=51.1887M/s output_size=944.149k page_index_size=31
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, true>        20354835 ns     20354291 ns           34 bytes_per_second=247.73Mi/s items_per_second=51.5162M/s output_size=944.174k page_index_size=51

@wgtmac wgtmac merged commit 1fcc892 into apache:main Jan 21, 2025
33 checks passed
@wgtmac wgtmac removed the awaiting committer review Awaiting committer review label Jan 21, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 1fcc892.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants