-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Trying to read a Parquet file that uses modular encryption when page indices are enabled in the ArrowReaderOptions results in an error like:
ArrowError("Parquet argument error: External: bad data")
To Reproduce
This test reproduces the issue when added to parquet/tests/encryption/encryption_async.rs:
#[tokio::test]
async fn test_read_with_page_index() {
let test_data = arrow::util::test_util::parquet_test_data();
let path = format!("{test_data}/uniform_encryption.parquet.encrypted");
let mut file = File::open(&path).await.unwrap();
let key_code: &[u8] = "0123456789012345".as_bytes();
let decryption_properties = FileDecryptionProperties::builder(key_code.to_vec())
.build()
.unwrap();
let options = ArrowReaderOptions::new()
.with_file_decryption_properties(decryption_properties)
.with_page_index(true);
let arrow_metadata = ArrowReaderMetadata::load_async(&mut file, options)
.await
.unwrap();
let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata(
file,
arrow_metadata,
)
.build()
.unwrap();
let _record_batches = record_reader.try_collect::<Vec<_>>().await.unwrap();
}Expected behavior
Data should be read successfully, and give the same results as when with_page_index(false) is used.
Additional context
This was encountered by @corwinjoy when integrating encryption support in DataFusion. Page indexes are enabled when data is queried with a filter predicate.