feat: Add optional prefetch hint for parsing Puffin Footer#1207
feat: Add optional prefetch hint for parsing Puffin Footer#1207liurenjie1024 merged 16 commits intoapache:mainfrom
Conversation
|
Hi, I believe this hint should be optional and we shouldn't require all users to provide it. |
|
@Xuanwo It is optional, let me change the title. Or do you mean that i should add a function like |
Yep. I believe we should not change the |
|
@Xuanwo Thanks for the review, it is fixed! |
| pub(crate) properties: HashMap<String, String>, | ||
| /// Optional prefetch hint you can pass for retrieving footer | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub(crate) prefetch_hint: Option<u8>, |
There was a problem hiding this comment.
I have concerns adding this field for two reasons:
- It looks impractical to ask user to pass in the
prefetch_hint. - This format is define by iceberg spec, and adding another field may lead to wrong file.
| /// `prefetch_hint` is used to try to fetch the entire footer in one read. If | ||
| /// the entire footer isn't fetched in one read the function will call the `read_no_prefetch` | ||
| /// option. | ||
| pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { |
There was a problem hiding this comment.
| pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { | |
| pub(crate) async fn read(input_file: &InputFile) -> Result<FileMetadata> { |
Suggestions:
- Keep original api
- Add anoter method
read_with_hintto implement the method with a hint.
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @jonathanc-n for this pr, generally looks good, some nit!
|
|
||
| // Hint cannot be larger than input file | ||
| if prefetch_hint as u64 > input_file_length { | ||
| return FileMetadata::read(input_file).await; |
There was a problem hiding this comment.
It would be better to add some warning log here.
There was a problem hiding this comment.
Lets push this pull request forward and do this in a separate pull request (good first issue). Are we going to be using tracing in this crate?
| + FileMetadata::FOOTER_STRUCT_LENGTH as usize | ||
| + FileMetadata::MAGIC_LENGTH as usize; | ||
| if footer_length > prefetch_hint as usize { | ||
| return FileMetadata::read(input_file).await; |
| /// `prefetch_hint` is used to try to fetch the entire footer in one read. If | ||
| /// the entire footer isn't fetched in one read the function will call the `read_no_prefetch` | ||
| /// option. | ||
| pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { |
|
|
||
| // Hint cannot be larger than input file | ||
| if prefetch_hint as u64 > input_file_length { | ||
| return FileMetadata::read(input_file).await; |
There was a problem hiding this comment.
Lets push this pull request forward and do this in a separate pull request (good first issue). Are we going to be using tracing in this crate?
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @jonathanc-n for this pr, LGTM!
|
|
||
| assert_eq!( | ||
| FileMetadata::read(&input_file).await.unwrap_err().to_string(), | ||
| FileMetadata::read(&input_file, ).await.unwrap_err().to_string(), |
There was a problem hiding this comment.
This change is unnecessary?
There was a problem hiding this comment.
Sorry about that, forgot to delete. Fixed!
Which issue does this PR close?
What changes are included in this PR?
Add prefetch hint for parsing puffin footer + change some docs
Are these changes tested?
Yes, unit tested