-
Notifications
You must be signed in to change notification settings - Fork 8.3k
lagInFrame and leadInFrame does not return NULL for calculated target rows which are out of frame. #26115
Description
Describe the unexpected behaviour
When we use lagInFrame/leadInFrame window function, the returned value of rows that to be out of the frame is 0 or an empty string(depends on the type of the original column). However, like most other DBMSs' implementation, the returned value is null instead of 0 or empty string. Semantically, 0 or empty string can be computed while null cannot. Consequently, 0 or "" leads to incorrect behavior in subsequent procedures if this kind of returned values are accepted, 0 and empty strings really means something semantically in many cases.
I looked into the document, don't quite understand the term "respect the frame", not sure if it means 0 and empty string must be the default value it should use, when the target row to calculate is out of frame.
How to reproduce
- Which ClickHouse server version to use
Any version since window function lagInFrame/leadInFrame are introduced.
For example, in the query below:
SELECT
number,
lagInFrame(number, 1) OVER w AS prevOne,
lagInFrame(number, 2) OVER w AS prevTwo
FROM numbers(10)
WINDOW w AS (ORDER BY number ASC)
Query id: 1027831f-0764-4a08-bc6a-70fd2752a4dc
┌─number─┬─prevOne─┬─prevTwo─┐
│ 0 │ 0 │ 0 │
│ 1 │ 0 │ 0 │
│ 2 │ 1 │ 0 │
│ 3 │ 2 │ 1 │
│ 4 │ 3 │ 2 │
│ 5 │ 4 │ 3 │
│ 6 │ 5 │ 4 │
│ 7 │ 6 │ 5 │
│ 8 │ 7 │ 6 │
│ 9 │ 8 │ 7 │
└────────┴─────────┴─────────┘
In the row 1, I'm expecting prevOne and prevTwo to be NULL, and in row 2, expecting prevTwo to be NULL, but obviously they are 0 instead.
Expected behavior
For rows that the function runs on target row is out of frame, expected returned value must be null as query below:
SELECT
number,
lagInFrame(number, 1) OVER w AS prevOne,
lagInFrame(number, 2) OVER w AS prevTwo
FROM numbers(10)
WINDOW w AS (ORDER BY number ASC)
Query id: ad65c077-90ef-4821-9376-c9567be203ff
┌─number─┬─prevOne─┬─prevTwo─┐
│ 0 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
│ 1 │ 0 │ ᴺᵁᴸᴸ │
│ 2 │ 1 │ 0 │
│ 3 │ 2 │ 1 │
│ 4 │ 3 │ 2 │
│ 5 │ 4 │ 3 │
│ 6 │ 5 │ 4 │
│ 7 │ 6 │ 5 │
│ 8 │ 7 │ 6 │
│ 9 │ 8 │ 7 │
└────────┴─────────┴─────────┘
A simple fix of this I can think of is making the return type of the function to be ColumnNullable():
DataTypePtr getReturnType() const override
{ return std::make_shared<DataTypeNullable>(argument_types[0]); }