Hi, I'm raising this both as a question and a possible bug. I'd appreciate your feedback to determine whether this is a confirmed issue.
The Question
|
/// The current byte offset in the reader |
|
offset: usize, |
My concern is about the type of offset in SerializedPageReaderState. Should it be u64 instead of usize? If I understand correctly, this offset represents a global position within a Parquet file, which can easily exceed 4 GB. On 32-bit environments (e.g., WebAssembly), usize is limited to u32's max, which could lead to problems with larger files.
The Potential Bug
As a frequent user of Parquet viewer with 32bit WebAssembly, I encountered an error with a file larger than 4 GB. The offset I read exceeded u32's max, resulting in the following error:
Integer overflow: out of range integral type conversion attempted
I traced this to the line where the exception was triggered, and verified that the offset causing the issue is global and indeed exceeds u32's max.
|
offset: usize::try_from(start)?, |