Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently ArrowWriter uses max_row_group_size as a row count limit. Whilst this is significantly simpler to implement, it is at odds with other arrow implementations that use a bytes threshold.
Describe the solution you'd like
Any or all of:
- Clearly document what
max_row_group_size is used for and how it is different from the other size quantities in WriterProperties
- Assess if the
DEFAULT_MAX_ROW_GROUP_SIZE of 128 * 1024 * 1024 makes sense given this is not bytes
- Add functionality to flush based on a bytes threshold instead of, or in addition to, the current row threshold