Skip to content

[wip] a prototype for window functions#18222

Merged
akuzm merged 32 commits intomasterfrom
aku/window-prototype
Dec 23, 2020
Merged

[wip] a prototype for window functions#18222
akuzm merged 32 commits intomasterfrom
aku/window-prototype

Conversation

@akuzm
Copy link
Copy Markdown
Contributor

@akuzm akuzm commented Dec 18, 2020

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 18, 2020
@akuzm akuzm marked this pull request as draft December 18, 2020 14:35
@akuzm akuzm mentioned this pull request Dec 18, 2020
17 tasks
@akuzm akuzm marked this pull request as ready for review December 22, 2020 09:11
if (auto it = window_descriptions.find(window_description.window_name);
it != window_descriptions.end())
{
assert(it->second.full_sort_description
Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check looks a little bit clumsy ...
I would rather group by sort full_sort_description initially.

Also, I suppose it may be reasonable group windows by partition_by, cause we can start common sorting for a window by it, and then add finish sorting by full_sort_description for each window function separately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll try to write the complete window matching (group the windows declared ad hoc with same parameters) + window declaration grammar, to figure this out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some windows are just totally identical, and then to some windows we can apply the optimization you describe (group them so that they have a common sorting prefix). I have both these points here in the "rough edges" list: #18097

// Populate the reverse map.
for (const auto & f : window_function_descriptions)
{
window_descriptions[f.window_name].window_functions.push_back(f);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we use window_name as grouping key, so it is equal for each window function from one window. OK.
It is interesting if window_name is used somewhere else. If not, we can remove this field form WindowFunctionDescription

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out to be redundant after we group the functions by windows, yes. But I'm not sure where else to put it... Maybe we can just group right away, and remove the common array of window functions and this separate grouping step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also addresses a couple of the above comments about maps and duplication of window_name.

col.column = col.type->createColumn();
col.name = f.column_name;

step.actions()->addInput(col);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it is a little bit strange to add window function name as input.
I think it would be logical to add a new action, after the calculation of window function arguments step, which will have this input.

Also it is not clear why we add window function column name to the list of required columns. I suppose it is possible that window function name is not actually used, and we can skip window function evaluation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it probably may be done the same way as with aggregation.
The idea is that we can evaluate the list of columns after aggregation (aggregated_columns), and start building new ExpressionActionsChain starting from this list. Maybe we can have the same fixed list of columns after window function.

I don't say if it is a good approach and will be easier, though.

Copy link
Copy Markdown
Contributor Author

@akuzm akuzm Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about the various approaches.

The aggregation one is logically similar to INPUT actions, but with more code (i.e. it also add columns as an input to the action chain, but in a different way). And we'll have to remove it anyway and add a proper WINDOW action. So I think we can keep the INPUT actions for now.

* NOTE: if `ast` is a SELECT query from a table, the structure of this table should not change during the lifetime of ExpressionAnalyzer.
*/
class ExpressionAnalyzer : protected ExpressionAnalyzerData, private boost::noncopyable
class ExpressionAnalyzer : public ExpressionAnalyzerData, private boost::noncopyable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should have been an all-public immutable struct. "Private data + accessors/modifiers" are for things that change over time and have to maintain complex invariants. On the contrary, the ExpressionAnalyzerData is built once at a particular stage of query analysis, and doesn't change. So "all-public immutable" would have been more suitable. But I don't feel like redoing it all now, so for uniformity let's change it back to private add an accessor for window descriptions.

@Akazz
Copy link
Copy Markdown
Contributor

Akazz commented Dec 22, 2020

@akuzm
Your idea of printing test queries along with their results into .reference-file is just awesome!

@alexey-milovidov alexey-milovidov added the 🎅 🎁 gift🎄 To make people wonder label Dec 22, 2020
Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for WIP

@akuzm
Copy link
Copy Markdown
Contributor Author

akuzm commented Dec 22, 2020

@akuzm
Your idea of printing test queries along with their results into .reference-file is just awesome!

Took it from Postgres. Probably we should convert all our tests to echo queries, just did it the quick way for now.

@akuzm
Copy link
Copy Markdown
Contributor Author

akuzm commented Dec 23, 2020

The build failure after merge is because of infrastructure problem.

@akuzm akuzm merged commit 5b0b3b4 into master Dec 23, 2020
@akuzm akuzm deleted the aku/window-prototype branch December 23, 2020 08:24
@akuzm akuzm restored the aku/window-prototype branch December 23, 2020 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🎅 🎁 gift🎄 To make people wonder pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants