Skip to content

[Parquet] [Question / Potential Bug Report] Should SerializedPageReaderState.offset & remaining_bytes be u64 instead of usize? #7910

@JigaoLuo

Description

@JigaoLuo

Hi, I'm raising this both as a question and a possible bug. I'd appreciate your feedback to determine whether this is a confirmed issue.

The Question

/// The current byte offset in the reader
offset: usize,

My concern is about the type of offset in SerializedPageReaderState. Should it be u64 instead of usize? If I understand correctly, this offset represents a global position within a Parquet file, which can easily exceed 4 GB. On 32-bit environments (e.g., WebAssembly), usize is limited to u32's max, which could lead to problems with larger files.

The Potential Bug

As a frequent user of Parquet viewer with 32bit WebAssembly, I encountered an error with a file larger than 4 GB. The offset I read exceeded u32's max, resulting in the following error:

Integer overflow: out of range integral type conversion attempted

I traced this to the line where the exception was triggered, and verified that the offset causing the issue is global and indeed exceeds u32's max.

offset: usize::try_from(start)?,

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions