Re-sort file groups in FileScanConfig to satisfy ordering requirements

`FileScanConfig::try_pushdown_sort` could support re-sorting or re-arranging the `FileGroup`s themselves using min/max statistics to satisfy the queries preferred sort order.

This is described in section 5.3 of [Pruning in Snowflake: Working Smarter, Not Harder](https://arxiv.org/pdf/2504.11540).

Some considerations are:
- If we start re-building groups what should the parallelism be? One the one hand it would make sense to try to match the original parallelism, on the other hand that may not be possible (e.g. if we can only satisfy the sort ordering by making groups `[[f1, f2, f3], [f4]]` maybe it's worth it to have lopsided groups, less or more groups) or even optimal (in a TopK query reduced parallelism can lead to faster queries if we end up only scanning 1 group or even 1 file; all of the work opening the others is wasted effort; this is also known as `ProgressiveEval` and discussed in https://github.com/apache/datafusion/issues/15191).
- 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-sort file groups in FileScanConfig to satisfy ordering requirements #19724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-sort file groups in FileScanConfig to satisfy ordering requirements #19724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions