-
Notifications
You must be signed in to change notification settings - Fork 8.3k
window functions: use storage sorting key #19289
Description
To compute a window function, we must first sort data according to PARTITION BY and then ORDER BY clauses of the window.
When the storage sorting key and the PARTITION BY + ORDER BY of the window have a common prefix, we must avoid sorting the data fully from scratch, and instead read it in (partially) sorted order and then sort incrementally.
For large amounts of data, sorting it is not feasible, so this incremental sorting is the only way to compute a window function. See an example we discussed in #18097:
'Crash tests' use case. Imagine a fast trading dataset.
Every second you have up to 10 million transactions with a micro/nanosecond resolution.
The task is to calculate the sum and the number of transactions in the exactly X seconds window prepending the transaction. (not on the second boundary but back from the current event). Single select can extract up to several weeks of data.
This is sum over (order by transaction_time range between 'X seconds' preceding and current row).
10M records per second * 30 days per month * 24 * 60 * 60 seconds a day = 25T records, probably about 100TB of data compressed