window functions: use storage sorting key

To compute a window function, we must first sort data according to `PARTITION BY` and then `ORDER BY` clauses of the window.
When the storage sorting key and the `PARTITION BY` + `ORDER BY` of the window have a common prefix, we must avoid sorting the data fully from scratch, and instead read it in (partially) sorted order and then sort incrementally.

For large amounts of data, sorting it is not feasible, so this incremental sorting is the only way to compute a window function. See an example we discussed in #18097:

> 'Crash tests' use case. Imagine a fast trading dataset.
>
> Every second you have up to 10 million transactions with a micro/nanosecond resolution.
> 
> The task is to calculate the sum and the number of transactions in the exactly X seconds window prepending the transaction. (not on the second boundary but back from the current event). Single select can extract up to several weeks of data.
> 
> This is sum over (order by transaction_time range between 'X seconds' preceding and current row).
> 10M records per second * 30 days per month * 24 * 60 * 60 seconds a day = 25T records, probably about 100TB of data compressed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

window functions: use storage sorting key #19289

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

window functions: use storage sorting key #19289

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions