-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
When serializing an all-null arrow array to parquet, the null-count in the stats is always 0.
To Reproduce
Steps to reproduce the behavior:
#[test]
fn statistics_null_counts_only_nulls() {
// check that null-count statistics for "only NULL"-columns are correct
let values = Arc::new(UInt64Array::from(vec![
None,
None,
]));
let file = one_column_roundtrip("null_counts", values, true);
// check statistics are valid
let reader = SerializedFileReader::new(file).unwrap();
let metadata = reader.metadata();
assert_eq!(metadata.num_row_groups(), 1);
let row_group = metadata.row_group(0);
assert_eq!(row_group.num_columns(), 1);
let column = row_group.column(0);
let stats = column.statistics().unwrap();
assert_eq!(stats.null_count(), 2); // <<< this fails, null count is 0
}Expected behavior
For all-null columns the null-count should be the same as the number of rows.
Additional context
Tested on c863a2c44bffa5c092a49e07910d5e9225483193.
I am claiming this issue since I have a fix ready.
Reactions are currently unavailable