Skip to content

[multistage][sorting] improve sorting efficiency #12228

@walterddr

Description

@walterddr

currently we have a very basic implementation of the SortOperator and a very basic rule for cross-server sorting plan generator. It generates an sort exchange which connects a local sorting prior to exchange then follow by a remote sorting on a single machine
this is originally designed for large quantity sorting and streaming responses. but none of those were implemented in SortOperator

Improvement

  1. for the current SortOperator implementation, the sort-exchange can be simplified to a normal exchange, and the local sort prior to exchange can be removed.
  2. for longer term we should support 2 strategies:
    • when data size is small, we should follow the 1st approach;
    • when data size is large, we should
      (a) create a local sort before the exchange, (either via sort-exchange, or via additional sort operator);
      (b) do a k-merge sort on the single-machine global sorting operator
      (c) stream the results back (which is possible with k-merge)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions