Skip to content

Limit ArrowWriter Row Group Size by bytes in addition to rows #1213

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently ArrowWriter uses max_row_group_size as a row count limit. Whilst this is significantly simpler to implement, it is at odds with other arrow implementations that use a bytes threshold.

Describe the solution you'd like

Any or all of:

  • Clearly document what max_row_group_size is used for and how it is different from the other size quantities in WriterProperties
  • Assess if the DEFAULT_MAX_ROW_GROUP_SIZE of 128 * 1024 * 1024 makes sense given this is not bytes
  • Add functionality to flush based on a bytes threshold instead of, or in addition to, the current row threshold

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions