Since this recurses, this could potentially blow out the stack with pathalogical inputs (e.g. a RecordBatch with 1M rows with a max_row_group_count of 1). I don't think it is necessary to fix now, I just wanted to point it out
Originally posted by @alamb in #9357 (comment)
Here is a reproducer (add to arrow/arrow_writer/mod.rs) which fails (process aborts due to stack overflow)
#[test]
fn test_row_group_limit_rows_only_pathological_stack_overflow_demo() {
let schema = Arc::new(Schema::new(vec![Field::new(
"int",
ArrowDataType::Int32,
false,
)]));
let array = Int32Array::from((0..1_000_000_i32).collect::<Vec<_>>());
let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(array)]).unwrap();
let props = WriterProperties::builder()
.set_max_row_group_row_count(Some(1))
.set_max_row_group_bytes(None)
.build();
let file = tempfile::tempfile().unwrap();
let mut writer = ArrowWriter::try_new(file, schema, Some(props)).unwrap();
// This currently recurses once per row-group split and can overflow the stack.
writer.write(&batch).unwrap();
}
The expected behavior is either an error (or ideally) successfully write the file