Skip to content

Implementation of window function nth_value#26334

Merged
akuzm merged 6 commits intoClickHouse:masterfrom
ryzuo:ryzuo
Jul 27, 2021
Merged

Implementation of window function nth_value#26334
akuzm merged 6 commits intoClickHouse:masterfrom
ryzuo:ryzuo

Conversation

@ryzuo
Copy link
Copy Markdown

@ryzuo ryzuo commented Jul 15, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implement window function nth_value(expr, N) that returns the value of the Nth row of the window frame.

Detailed description / Documentation draft:
nth_value(expr, N) returns the value of expr from the N-th row of the window frame. If there is no such row, the return value is NULL.

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jul 15, 2021
@ryzuo
Copy link
Copy Markdown
Author

ryzuo commented Jul 15, 2021

Related issue
#26112

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you also fixing the lagInFrame nullability problems in this PR? Better make two separate ones.

This change looks incorrect, because the return type of lagInFrame doesn't have to be nullable, e.g. when a not-null default value is specified. The problems in #26115 are partly caused by that we apply nullability handling for aggregate functions to window functions as well, when we shouldn't. The fix needs some refactoring that would introduce deeper separation between aggregate and window functions. I started working on it but was distracted by other tasks. Probably will continue next week.

So in the meantime, could you please remove the fix for lagInFrame from this PR, and keep only the new function nth_value? That function looks good.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments, you are right, the force of nullability here is not appropriate and incomplete I accidently pushed it, I will clean it up, and I'm still looking into this problem. Speaking of the non-null default value, I found the result is not correct if the given default value is an arithmetic expression with type cast required, e.g. if the default value for the 3rd arg is like value*1.5, it looks like over flows. I tracked all the way up to interpreter, and also with null value, my guess there might be something in parser when it deduce the type of the 3rd arg? But I haven't looked into that part yet. I'm new in CH, still trying to understand lots of things.

@akuzm akuzm self-assigned this Jul 15, 2021
@ryzuo
Copy link
Copy Markdown
Author

ryzuo commented Jul 21, 2021

@akuzm Hi Alexander, there are still some check failures do not seem to relate to the code, Could you please help to take a look? And is there any more comments and suggestions?

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Jul 21, 2021

@akuzm Hi Alexander, there are still some check failures do not seem to relate to the code, Could you please help to take a look? And is there any more comments and suggestions?

clang-tidy complains about your changes:

2021-07-21 02:40:24 /build/obj-x86_64-linux-gnu/../src/Processors/Transforms/WindowTransform.cpp:1624:18: error: invalid case style for local variable 'srcColumnPtr' [readability-identifier-naming,-warnings-as-errors]
2021-07-21 02:40:24             auto srcColumnPtr = transform->blockAt(target_row).input_columns[workspace.argument_column_indices[0]];
2021-07-21 02:40:24                  ^~~~~~~~~~~~
2021-07-21 02:40:24                  src_column_ptr

Other things are probably not relevant, I'll check them before merging.

@ryzuo
Copy link
Copy Markdown
Author

ryzuo commented Jul 21, 2021

I see, I will update that, thanks for your comments. And I saw your PR for 26115, should I change the nth_value returning in this PR like your changes? Or I should wait you merge both and open another one?

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Jul 21, 2021

I see, I will update that, thanks for your comments. And I saw your PR for 26115, should I change the nth_value returning in this PR like your changes? Or I should wait you merge both and open another one?

I just merged it. It would be good if you merge with master and change the nth_value implementation in this PR as well. By the way, I'm open for suggestions/test cases regarding the lagInFrame fix -- not sure I have it all totally figured out yet...

ryzuo and others added 5 commits July 22, 2021 10:19
1. Implemented the window function nth_value
2. Make the return type of lag/lead, and nth_value functions to be
ColumnNullable, thus the returned value can be null if it is out
of frame.
Make nth_value for nullable values for out of frame rows, as the same
fashion as lagInFrame just does.
@ryzuo ryzuo requested a review from akuzm July 23, 2021 09:12
Comment on lines +1579 to +1584
if (argument_types.empty())
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Function {} takes at least one argument", name_);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it always needs two arguments, we probably don't need this check, the next one is enough.

Copy link
Copy Markdown
Contributor

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than that one comment.

#include <Interpreters/ExpressionActions.h>
#include <Interpreters/convertFieldToType.h>


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally have two empty lines here, let's keep it.

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Jul 27, 2021

The tests are flaky/broken in master.

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Sep 25, 2021

@akuzm Unfortunately, this function contains a major security issue: #29347

order by number
;

-- nth_value without specific frame range given
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test should be created instead of adding new cases to old test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants