GH-40592: [C++][Parquet] Implement SizeStatistics#40594
Conversation
emkornfield
left a comment
There was a problem hiding this comment.
A few high level questions/suggestions.
emkornfield
left a comment
There was a problem hiding this comment.
Took another quick pass through and didn't see any major blockers, as long as @pitrou is happy with changes I'm OK to merge.
|
Thanks for your thorough review! I think I have addressed all comments except for the benchmark. Could you please take a look again? @pitrou |
|
Gentle ping @pitrou :) |
|
Damn, I also hit the ORC timeout issue from the failed CI: |
|
Rebased and got following (unrelated) failure from the MacOS CI: |
|
The macOS failure was fixed in #45057, so you can rebase another time 😁 |
|
Here we go! |
|
Thanks a lot @wgtmac , we can merge now. |
|
Thank you for your thorough review! Let me work on the benchmark. |
|
Grats! |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f93004f. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 16 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Parquet format 2.10.0 has introduced SizeStatistics. parquet-mr has also implemented this: apache/parquet-java#1177. Now it is time for parquet-cpp to pick the ball.
What changes are included in this PR?
Implement reading and writing size statistics for parquet-cpp.
Are these changes tested?
Yes, a bunch of test cases have been added.
Are there any user-facing changes?
Yes, now parquet users are able to read and write size statistics.