-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
My friends wrote a data format lance that has interconvertibility with parquet, and I want to make another implementation with Rust.
However, they used EXTENTION type,
it seems has not been implemented in arrow-rs.
Describe the solution you'd like
Let the reader can convert the parquet file with EXTENTION type to Arrow.
Describe alternatives you've considered
Additional context
Python code I used to generate such a file.
# with pyarrow-9.0.0
import pyarrow as pa
class UuidType(pa.PyExtensionType):
def __init__(self):
pa.PyExtensionType.__init__(self, pa.binary(16))
def __reduce__(self):
return UuidType, ()
if __name__ == '__main__':
uuid_type = UuidType()
print(uuid_type.extension_name)
print(uuid_type.storage_type)
import uuid
storage_array = pa.array([uuid.uuid4().bytes for _ in range(4)], pa.binary(16))
arr = pa.ExtensionArray.from_storage(uuid_type, storage_array)
print(arr)
table = pa.Table.from_arrays([arr], names=["uuid"])
import pyarrow.parquet as pq
pq.write_table(table, "extension_example.parquet")
# successfully read and print
parquet_table = pq.read_table('extension_example.parquet')
print("schema", parquet_table.schema)
print("table", parquet_table)Rust code that failed reading
let input_file_name = "extension_example.parquet";
//https://docs.rs/parquet/19.0.0/parquet/arrow/index.html
use arrow::record_batch::RecordBatchReader;
use parquet::arrow::{ParquetFileArrowReader, ArrowReader, ProjectionMask};
use std::fs::File;
let file = File::open(input_file_name).unwrap();
let mut arrow_reader = ParquetFileArrowReader::try_new(file).unwrap();
let mask = ProjectionMask::leaves(arrow_reader.parquet_schema(), [0]);
println!("parquet schema is: {:?}", arrow_reader.parquet_schema());
println!("Converted arrow schema is: {}", arrow_reader.get_schema().unwrap());error log
thread 'tests::test_convert' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to get root as message stored in ARROW:schema: Utf8Error { error: Utf8Error { valid_up_to: 0, error_len: Some(1) }, range: 216..255, error_trace: ErrorTrace([TableField { field_name: \"value\", position: 208 }, VectorElement { index: 0, position: 116 }, TableField { field_name: \"custom_metadata\", position: 92 }, VectorElement { index: 0, position: 48 }, TableField { field_name: \"fields\", position: 40 }, UnionVariant { variant: \"MessageHeader::Schema\", position: 24 }, TableField { field_name: \"header\", position: 24 }]) }")', src/lib.rs:39:77
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate