Skip to content

Dataframe API should expose mechanism to return sorted results. #7414

@jleibs

Description

@jleibs

Our dataframe APIs will currently return results containing rows that are implicitly sorted based on: (chunk_time, row_time) and not globally sorted on row_time.

For many applications this doesn't matter, or won't be evident, making it all the more surprising that it's not actually the behavior.

To that end we need an API configuration on the query indicating whether to globally sort (potentially expensive), or to return the results in optimal traversal order.

Sorting has a few steps we need to consider:

  • Determining the proper sort order
  • Slicing pov_chunks into sorted slices, which can lead to smaller (potentially unit-length) rows.
  • Eventually merging those slices back together efficiently

If we don't land this for 0.19, we need to clearly document the behavior.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions