-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Inspection of the page index metadata shows that the ColumnIndex for columns that are all nulls is written regardless of the setting of GenericColumnWriter::statistics_enabled.
To Reproduce
In the output below, the c_login column has a ColumnIndex entry as well as the expected OffsetIndex. No other columns do.
% target/debug/parquet-rewrite -i parquet-testing/data/delta_byte_array.parquet -o test.parquet --statistics-enabled none
% pqmeta -i test.parquet Rowgroup 0: num_rows:1000
-----------------------------------------------------------
c_customer_id
--------------------------------------------------
OffsetIndex:
page0: offset:20023 compressed_size:1280 first_row_index:0 var_bytes:16000
c_salutation
--------------------------------------------------
OffsetIndex:
page0: offset:21404 compressed_size:473 first_row_index:0 var_bytes:3145
c_first_name
--------------------------------------------------
OffsetIndex:
page0: offset:26992 compressed_size:1325 first_row_index:0 var_bytes:5650
c_last_name
--------------------------------------------------
OffsetIndex:
page0: offset:35182 compressed_size:1323 first_row_index:0 var_bytes:6011
c_preferred_cust_flag
--------------------------------------------------
OffsetIndex:
page0: offset:36567 compressed_size:235 first_row_index:0 var_bytes:971
c_birth_country
--------------------------------------------------
OffsetIndex:
page0: offset:39532 compressed_size:1097 first_row_index:0 var_bytes:8458
c_login
--------------------------------------------------
OffsetIndex:
page0: offset:40685 compressed_size:26 first_row_index:0 var_bytes:0
ColumnIndex: boundary_order:ASCENDING
page0: null_page:True min_val: max_val: null_count:1000
c_email_address
--------------------------------------------------
OffsetIndex:
page0: offset:71199 compressed_size:1334 first_row_index:0 var_bytes:26562
c_last_review_date
--------------------------------------------------
OffsetIndex:
page0: offset:76368 compressed_size:1200 first_row_index:0 var_bytes:6825
Expected behavior
Null columns should not have a ColumnIndex present when page statistics are not enabled.
Additional context
Reactions are currently unavailable