Skip to content

lagInFrame and leadInFrame does not return NULL for calculated target rows which are out of frame. #26115

@ryzuo

Description

@ryzuo

Describe the unexpected behaviour
When we use lagInFrame/leadInFrame window function, the returned value of rows that to be out of the frame is 0 or an empty string(depends on the type of the original column). However, like most other DBMSs' implementation, the returned value is null instead of 0 or empty string. Semantically, 0 or empty string can be computed while null cannot. Consequently, 0 or "" leads to incorrect behavior in subsequent procedures if this kind of returned values are accepted, 0 and empty strings really means something semantically in many cases.

I looked into the document, don't quite understand the term "respect the frame", not sure if it means 0 and empty string must be the default value it should use, when the target row to calculate is out of frame.

How to reproduce

  • Which ClickHouse server version to use
    Any version since window function lagInFrame/leadInFrame are introduced.
    For example, in the query below:
SELECT
    number,
    lagInFrame(number, 1) OVER w AS prevOne,
    lagInFrame(number, 2) OVER w AS prevTwo
FROM numbers(10)
WINDOW w AS (ORDER BY number ASC)

Query id: 1027831f-0764-4a08-bc6a-70fd2752a4dc

┌─number─┬─prevOne─┬─prevTwo─┐
│      0 │       0 │       0 │
│      1 │       0 │       0 │
│      2 │       1 │       0 │
│      3 │       2 │       1 │
│      4 │       3 │       2 │
│      5 │       4 │       3 │
│      6 │       5 │       4 │
│      7 │       6 │       5 │
│      8 │       7 │       6 │
│      9 │       8 │       7 │
└────────┴─────────┴─────────┘

In the row 1, I'm expecting prevOne and prevTwo to be NULL, and in row 2, expecting prevTwo to be NULL, but obviously they are 0 instead.

Expected behavior
For rows that the function runs on target row is out of frame, expected returned value must be null as query below:

SELECT
    number,
    lagInFrame(number, 1) OVER w AS prevOne,
    lagInFrame(number, 2) OVER w AS prevTwo
FROM numbers(10)
WINDOW w AS (ORDER BY number ASC)

Query id: ad65c077-90ef-4821-9376-c9567be203ff

┌─number─┬─prevOne─┬─prevTwo─┐
│      0 │    ᴺᵁᴸᴸ │    ᴺᵁᴸᴸ │
│      1 │       0 │    ᴺᵁᴸᴸ │
│      2 │       1 │       0 │
│      3 │       2 │       1 │
│      4 │       3 │       2 │
│      5 │       4 │       3 │
│      6 │       5 │       4 │
│      7 │       6 │       5 │
│      8 │       7 │       6 │
│      9 │       8 │       7 │
└────────┴─────────┴─────────┘

A simple fix of this I can think of is making the return type of the function to be ColumnNullable():

DataTypePtr getReturnType() const override
    { return std::make_shared<DataTypeNullable>(argument_types[0]); }

Metadata

Metadata

Assignees

Labels

comp-window-functionsWindow function execution + frame handling (ROW_NUMBER/RANK/LAG/LEAD, frames, partitions, order).unexpected behaviourResult is unexpected, but not entirely wrong at the same time.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions