[ARROW-17904] add flatten data files with checksums on datapage v1#29
[ARROW-17904] add flatten data files with checksums on datapage v1#29pitrou merged 4 commits intoapache:masterfrom
Conversation
|
This will be used in apache/arrow#14351 |
I'm not sure I understand. Is there one file or two files with corrupt CRCs? |
Lets draw some pictures here:
And each page has |
|
Thanks @mapleFU , can you add that information to https://github.com/apache/parquet-testing/blob/master/data/README.md ? |
I've upload a basic description. But perhaps I'm not able to written a good document, so welcome to edit it directly. @pitrou |
Here I add 3 files for checking checksums, there schema are all:
And there contents are equal:
The code is borrow and modified from
parquet-mr'sTestDataPageV1Checksums, because seems that only parquet-mr implements crc on data page v1. Each file has two columns, and each column has two pages, the size of each page would tried to be 10KB.And: