Skip to content

Cannot read encrypted Parquet file if page index reading is enabled #7629

@adamreeve

Description

@adamreeve

Describe the bug

Trying to read a Parquet file that uses modular encryption when page indices are enabled in the ArrowReaderOptions results in an error like:

ArrowError("Parquet argument error: External: bad data")

To Reproduce

This test reproduces the issue when added to parquet/tests/encryption/encryption_async.rs:

#[tokio::test]
async fn test_read_with_page_index() {
    let test_data = arrow::util::test_util::parquet_test_data();
    let path = format!("{test_data}/uniform_encryption.parquet.encrypted");
    let mut file = File::open(&path).await.unwrap();

    let key_code: &[u8] = "0123456789012345".as_bytes();
    let decryption_properties = FileDecryptionProperties::builder(key_code.to_vec())
        .build()
        .unwrap();

    let options = ArrowReaderOptions::new()
        .with_file_decryption_properties(decryption_properties)
        .with_page_index(true);

    let arrow_metadata = ArrowReaderMetadata::load_async(&mut file, options)
        .await
        .unwrap();

    let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata(
        file,
        arrow_metadata,
    )
    .build()
    .unwrap();
    let _record_batches = record_reader.try_collect::<Vec<_>>().await.unwrap();
}

Expected behavior
Data should be read successfully, and give the same results as when with_page_index(false) is used.

Additional context

This was encountered by @corwinjoy when integrating encryption support in DataFusion. Page indexes are enabled when data is queried with a filter predicate.

Metadata

Metadata

Assignees

Labels

bugparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions