Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Sep 1, 2025

Which issue does this PR close?

Rationale for this change

See #8263

What changes are included in this PR?

Avoid page overflows

Are these changes tested?

  • Will add

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 1, 2025
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mapleFU, it seems desirable to me to add a little error checking. This needs a little more work, and it would be nice to add a test. You could dummy up a CompressedPage with a ridiculously large uncompressed_size for instance.

@mapleFU mapleFU requested a review from etseidl September 9, 2025 15:29
@mapleFU
Copy link
Member Author

mapleFU commented Sep 9, 2025

I've change the logic to to_thrift_page ( which is just pub(crate) and commonly used) and add tests. Would you mind take a look again? @alamb @etseidl

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @mapleFU!

I'm a little sad that this is going in to_thrift since that will hopefully be removed soon (#5854). I'll be sure to merge this into my working branch.

Nevermind...CompressedPage::to_thrift_headr actually remains 😅

@mapleFU
Copy link
Member Author

mapleFU commented Sep 10, 2025

Actually I just meets a file which is corrupt by this is issue, cry :-(

            {
              "compression": "zstd",
              "encoding": "delta_length_byte_array",
              "page_type": "data_page_v2",
              "offset": 5662247374,
              "compressed_bytes": 581761248,
              "uncompressed_bytes": -1963337762,
              "header_bytes": 34,
              "num_values": 10000
            },

@mapleFU mapleFU requested a review from alamb September 12, 2025 02:03
@mapleFU mapleFU added the bug label Sep 12, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @mapleFU and @etseidl

@alamb alamb merged commit 30a7798 into apache:main Sep 12, 2025
17 of 18 checks passed
@mapleFU mapleFU deleted the avoid-parquet-page-size-overflow branch September 13, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet: Avoid page size exceeds i32::MAX

3 participants