Skip to content

Standardize creation and configuration of parquet --> Arrow readers ( ParquetRecordBatchReaderBuilder) #2427

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently a ParquetFileArrowReader is created from a ChunkReader or an Arc<dyn FileReader>, and an optional set of ArrowReaderOptions. Then ArrowReader can be used to obtain a ParquetRecordBatchReader from this.

Not only is this interface deeply confusing, but it is unclear how to extend it to support functionality such as row filtering, predicate pushdown, etc... which needs the schema information before it can be computed, information which is only available after the file has been opened.

Describe the solution you'd like

I would like a ParquetRecordBatchReaderBuilder similar to ParquetRecordBatchStreamBuilder

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions