Skip to content

Conversation

@JigaoLuo
Copy link
Contributor

@JigaoLuo JigaoLuo commented Jul 12, 2025

Which issue does this PR close?

Rationale for this change

There is a copy from my issue page:

/// The current byte offset in the reader
offset: usize,

My concern is about the type of offset in SerializedPageReaderState. Should it be u64 instead of usize? If I understand correctly, this offset represents a global position within a Parquet file, which can easily exceed 4 GB. On 32-bit environments (e.g., WebAssembly), usize is limited to u32's max, which could lead to problems with larger files.

What changes are included in this PR?

This PR does type changes only for SerializedPageReaderState.offset & remaining_bytes

Are these changes tested?

I can pass with local unit tests via cargo test -p parquet

Are there any user-facing changes?

No

JigaoLuo added 2 commits July 12, 2025 15:03
Signed-off-by: Jigao Luo <[email protected]>
Signed-off-by: Jigao Luo <[email protected]>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 12, 2025
@JigaoLuo JigaoLuo changed the title Use u64 for SerializedPageReaderState.offset & remaining_bytes, instead of usize [Parquet] Use u64 for SerializedPageReaderState.offset & remaining_bytes, instead of usize Jul 12, 2025
Signed-off-by: Jigao Luo <[email protected]>
Copy link
Contributor

@XiangpengHao XiangpengHao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JigaoLuo this looks good to me! left minor optional doc change.

Signed-off-by: Jigao Luo <[email protected]>
@JigaoLuo
Copy link
Contributor Author

Thank you @XiangpengHao for reviewing. I have updated the doc.

@alamb alamb added the api-change Changes to the arrow API label Jul 12, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JigaoLuo and @XiangpengHao -- this makes sense to me

@etseidl or @jhorstmann do you have any concerns about this change?

@jhorstmann
Copy link
Contributor

Looks good to me, thanks!

@alamb alamb merged commit 5555d30 into apache:main Jul 14, 2025
17 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 14, 2025

Thanks again @JigaoLuo and thanks @mbrobbel and @jhorstmann

@JigaoLuo
Copy link
Contributor Author

Thank you to all the reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] [Question / Potential Bug Report] Should SerializedPageReaderState.offset & remaining_bytes be u64 instead of usize?

5 participants