Skip to content

Rewrite ParquetRecordBatchReader (sync api) in terms of the PushDecoder #8678

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We added the ParquetPushDecoder in #7997

One of the rationales is to avoid duplicating the control logic between the Async Reader and the Sync Reader.

However, it actually (temporarily) makes the problem worse by adding a 3rd copy of the control logic ala the xkcd standards effect

image

Once we have completed this ticket and the following one, there will be one control loop:

Describe the solution you'd like

Rewrite ParquetRecordBatchReader in using the ParquetPushDecoder

Describe alternatives you've considered

The IO pattern in the sync decoder is different than the async decoder -- it evaluates predicates on all row groups first and then decodes each row group so the push decoder will take some finagling

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions