-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
In #7562 @devinjdangelo added the (really neat) feature to write a single parquet file in parallel.
This feature is enabled by a feature flag (`allow_single_file_parallelism), that defaults to off.
We haven't turned it on by default yet because the resulting parquet files don't have the necessary index structures (bloom filter, column_index, and offset_index) needed for high performance (see details in this conversation https://github.com/apache/arrow-datafusion/pull/7562/files#r1327037733)
Describe the solution you'd like
I would like the created parquet files to have the necessary index structures -- apache/arrow-rs#4823 tracks adding such an API upstream in arrow-rs.
Describe alternatives you've considered
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request