Add column index writer for parquet#1935
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1935 +/- ##
==========================================
+ Coverage 83.41% 83.54% +0.12%
==========================================
Files 214 221 +7
Lines 57004 57395 +391
==========================================
+ Hits 47551 47949 +398
+ Misses 9453 9446 -7
Continue to review full report at Codecov.
|
04f164a to
e705846
Compare
735f4ed to
43512f5
Compare
ea799a9 to
798af4f
Compare
798af4f to
f76ff65
Compare
|
@Ted-Jiang PTAL |
|
Do we need to add an option to control this feature in the |
tustvold
left a comment
There was a problem hiding this comment.
This is looking very nice, I think it would be good to maintain a clearer separation between file-level metadata and the index metadata. Mixing the two not only leads to the mutability issues you've run into, but it also can make it hard to reason about what fields are populated when. Perhaps we could have something like a ColumnChunkIndex or something?
cf7cbfc to
ffa3d68
Compare
|
@tustvold reset, PTAL |
tustvold
left a comment
There was a problem hiding this comment.
Love it, some minor nits and then this can go in 😄
ffa3d68 to
a5faeb6
Compare
| null_pages: Vec<bool>, | ||
| min_values: Vec<Vec<u8>>, | ||
| max_values: Vec<Vec<u8>>, | ||
| // TODO: calc the order for all pages in this column |
There was a problem hiding this comment.
👍 this is useful for checking whether use page index
There was a problem hiding this comment.
👍 this is useful for checking whether use page index
If the data in the pages are ordered by ascend or descend, we can use the binary search to accelerate the page filter.
There was a problem hiding this comment.
If the boundaryorder is UNORDERED, we need to filter the page one by one.
|
Thank you 🥇 |
Which issue does this PR close?
part of #1777
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?