Skip to content

Parquet ColumnIndex for null columns is written even when statistics are disabled #6010

@etseidl

Description

@etseidl

Describe the bug
Inspection of the page index metadata shows that the ColumnIndex for columns that are all nulls is written regardless of the setting of GenericColumnWriter::statistics_enabled.

To Reproduce
In the output below, the c_login column has a ColumnIndex entry as well as the expected OffsetIndex. No other columns do.

% target/debug/parquet-rewrite -i parquet-testing/data/delta_byte_array.parquet -o test.parquet --statistics-enabled none
% pqmeta -i test.parquet                                                                                           Rowgroup 0: num_rows:1000
-----------------------------------------------------------

c_customer_id
--------------------------------------------------
OffsetIndex:
  page0: offset:20023 compressed_size:1280 first_row_index:0 var_bytes:16000


c_salutation
--------------------------------------------------
OffsetIndex:
  page0: offset:21404 compressed_size:473 first_row_index:0 var_bytes:3145


c_first_name
--------------------------------------------------
OffsetIndex:
  page0: offset:26992 compressed_size:1325 first_row_index:0 var_bytes:5650


c_last_name
--------------------------------------------------
OffsetIndex:
  page0: offset:35182 compressed_size:1323 first_row_index:0 var_bytes:6011


c_preferred_cust_flag
--------------------------------------------------
OffsetIndex:
  page0: offset:36567 compressed_size:235 first_row_index:0 var_bytes:971


c_birth_country
--------------------------------------------------
OffsetIndex:
  page0: offset:39532 compressed_size:1097 first_row_index:0 var_bytes:8458


c_login
--------------------------------------------------
OffsetIndex:
  page0: offset:40685 compressed_size:26 first_row_index:0 var_bytes:0

ColumnIndex: boundary_order:ASCENDING
  page0: null_page:True min_val: max_val: null_count:1000

c_email_address
--------------------------------------------------
OffsetIndex:
  page0: offset:71199 compressed_size:1334 first_row_index:0 var_bytes:26562


c_last_review_date
--------------------------------------------------
OffsetIndex:
  page0: offset:76368 compressed_size:1200 first_row_index:0 var_bytes:6825

Expected behavior
Null columns should not have a ColumnIndex present when page statistics are not enabled.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions