-
Notifications
You must be signed in to change notification settings - Fork 707
Dataframe API should expose mechanism to return sorted results. #7414
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Our dataframe APIs will currently return results containing rows that are implicitly sorted based on: (chunk_time, row_time) and not globally sorted on row_time.
For many applications this doesn't matter, or won't be evident, making it all the more surprising that it's not actually the behavior.
To that end we need an API configuration on the query indicating whether to globally sort (potentially expensive), or to return the results in optimal traversal order.
Sorting has a few steps we need to consider:
- Determining the proper sort order
- Slicing pov_chunks into sorted slices, which can lead to smaller (potentially unit-length) rows.
- Eventually merging those slices back together efficiently
If we don't land this for 0.19, we need to clearly document the behavior.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request