Skip to content

Allow inserts to a sorted ListingTable #7354

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As of now, you can

  1. create an external table (implemented by ListingTable) that points at a local directory and can data to it which makes new files
  2. create an external table (implemented by ListingTable) that points at a local directory with a declared sort order and datafusion will take advantage of that order!

Sadly you can not do both together -- insert data into external table that has had a sort order declared. For example:

$ mkdir output
$ datafusion-cli
DataFusion CLI v29.0.0
❯ create external table output(time timestamp) stored as parquet location 'output' with order (time);
0 rows in set. Query took 0.002 seconds.

❯ insert into output values (now());
This feature is not implemented: Writing to a sorted listing table via insert into is not supported yet. To write to this table in the meantime, register an equivalent table with file_sort_order = vec![]

Describe the solution you'd like

From @devinjdangelo comments in #6569 (comment)

In the case of appending new files to a directory, I think it is as simple as having FileSinkExec require its input be sorted. DataFusion's optimizer should do the rest to ensure the new file is sorted properly.

In the case of a single file (LOCATION 'foo.parquet' for example), likely can't be handled efficiently as doing so would require reading the existing file, merging that with the new data and rewriting the whole file.

Describe alternatives you've considered

Alternatively, we could have a check to see if 1) the table is sorted and 2) the input to FileSinkExec is sorted. If 1) is true but 2) is not, we would need to update the metadata about the table to indicate for subsequent queries it is no longer guaranteed to be sorted.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions