[Epic] A collection of items related to processing larger than memory datasets (via spilling, externalized algorithm, etc)

### Is your feature request related to a problem or challenge?

This epic attempts to organize attempts to improve DataFusion's ability to process datasets that are larger than fit in configured memory budget

Some of DataFusion's "pipeline blocking" operations (SortExec and HashGroupBy) already do work with datasets that are larger than fit in memory, but the performance and usability could be improved


- [x] https://github.com/apache/datafusion/issues/12136
- [ ] https://github.com/apache/datafusion/issues/13123
- [x] https://github.com/apache/datafusion/issues/14078
- [x] https://github.com/apache/datafusion/issues/14692
- [ ] https://github.com/apache/datafusion/issues/14851
- [x] https://github.com/apache/datafusion/issues/15271



Note: Joins are another operation that can run out of memory and will error (rather than falling back to some other strategy like Sort-Merge-Join for example). If people are interested in making this better, I think we could organize another project


### Describe the solution you'd like

_No response_

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] A collection of items related to processing larger than memory datasets (via spilling, externalized algorithm, etc) #14077

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Epic] A collection of items related to processing larger than memory datasets (via spilling, externalized algorithm, etc) #14077

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions