Support skip_page missing OffsetIndex Fallback in SerializedPageReader#2460
Support skip_page missing OffsetIndex Fallback in SerializedPageReader#2460tustvold merged 3 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
without this clippy get
error: large size difference between variants
--> parquet/src/file/serialized_reader.rs:484:5
|
484 | / Values {
485 | | /// The current byte offset in the reader
486 | | offset: usize,
487 | |
... |
492 | | next_page_header: Option<PageHeader>,
493 | | },
| |_____^ this variant is 336 bytes
|
= note: `-D clippy::large-enum-variant` implied by `-D warnings`
note: and the second-largest variant is 72 bytes:
--> parquet/src/file/serialized_reader.rs:494:5
but i think the 336 is wrong because in PageHeader, most of the large arg are option, clippy can not know when running only one is some , so i think remove this check here is reasonable (or wrap 'Box' i think this will downgrade)🤔
pub data_page_header: Option<DataPageHeader>,
pub index_page_header: Option<IndexPageHeader>,
pub dictionary_page_header: Option<DictionaryPageHeader>,
pub data_page_header_v2: Option<DataPageHeaderV2>,
There was a problem hiding this comment.
An option takes up the same space regardless of it is set, they're like C++ union types in that sense. I think you should box the page header probably...
There was a problem hiding this comment.
you are right! tested in ut
#[test]
fn test() {
enum Test {
Large {
page: Option<PageHeader>,
}
}
let test1 = Test::Large { page: None };
println!("{}", std::mem::size_of_val(&test1));
()
}
got 320 😂 thanks for your info!
I should try before ask😔
tustvold
left a comment
There was a problem hiding this comment.
Looking good, just some minor nits
| if *remaining_bytes == 0 { | ||
| return Ok(None); | ||
| } | ||
| return if let Some(header) = next_page_header.take() { |
There was a problem hiding this comment.
I think this probably shouldn't take but just be as_ref?
There was a problem hiding this comment.
Yes, the only way set it none. Is after skip_next_page.
|
I'm going to get this in as the fix for parquet_derive is in #2494 and I want this to make the release 😄 |
|
Benchmark runs are scheduled for baseline = 9f77e4e and contender = 12cc067. 12cc067 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2459
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?