Skip to content

Conversation

@somandal
Copy link
Contributor

@somandal somandal commented Apr 3, 2023

This PR adds support for the ranking ROW_NUMBER() window function in Apache Pinot. ROW_NUMBER() requires ROW type window function support rather than RANGE type for which we added support in Phase 1. This PR sets up a potential framework to use ROW type window functions but only implements this for ROW_NUMBER(). ROW_NUMBER() can be used in the following types of queries:

  • Empty OVER()(s) with some select column [see limitations below]
  • OVER(ORDER BY)(s)
  • OVER(PARTITION BY)(s)
  • OVER(PARTITION BY ORDER BY)(s)

There are some limitations with ROW_NUMBER() which are:

  • Apache Calcite enforces the window function type to ROW. Due to the lack of support for multiple window groups today, ROW_NUMBER() cannot be combined with other window aggregation functions in the same query.
  • Queries which use an empty OVER() without any other column results in Apache Calcite not projecting any columns. E.g. query: SELECT ROW_NUMBER() OVER() from table;. I've added a TODO to look into how to get Apache Calcite to project at least one column in this scenario.
    • See ProjectWindowTransposeRule to better understand what's happening here. Basically it tries to push a Project below the Window, but finds no input fields referenced. Due to this it creates an empty Project below the Window. The Project above the Window gets marked as trivial and is removed resulting in the following type of plan:
Query: SELECT ROW_NUMBER() OVER() FROM a

Execution Plan
LogicalWindow(window#0=[window( rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
  LogicalExchange(distribution=[hash])
    LogicalProject
      LogicalTableScan(table=[[a]])

The design document and issue for window functions support can be found below:

Prior Phase 1 PRs related to window functions:

cc @siddharthteotia @walterddr @vvivekiyer @ankitsultana

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2023

Codecov Report

Merging #10527 (61290f6) into master (6283ee7) will decrease coverage by 0.06%.
The diff coverage is 95.23%.

@@             Coverage Diff              @@
##             master   #10527      +/-   ##
============================================
- Coverage     70.40%   70.34%   -0.06%     
- Complexity     6484     6493       +9     
============================================
  Files          2106     2106              
  Lines        112882   112920      +38     
  Branches      17000    17011      +11     
============================================
- Hits          79477    79437      -40     
- Misses        27835    27908      +73     
- Partials       5570     5575       +5     
Flag Coverage Δ
integration1 24.54% <0.00%> (+0.12%) ⬆️
integration2 24.10% <0.00%> (-0.05%) ⬇️
unittests1 67.92% <95.23%> (+0.01%) ⬆️
unittests2 13.88% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...uery/runtime/operator/WindowAggregateOperator.java 95.28% <95.23%> (-0.80%) ⬇️

... and 32 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@somandal somandal force-pushed the window-functions-row-number branch from ecd1df2 to 61290f6 Compare April 7, 2023 18:41
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes multi-stage Related to the multi-stage query engine labels Apr 10, 2023
@walterddr walterddr merged commit 2ad741d into apache:master Apr 10, 2023
walterddr added a commit that referenced this pull request Apr 10, 2023
walterddr added a commit that referenced this pull request Apr 10, 2023
@somandal somandal deleted the window-functions-row-number branch April 10, 2023 19:27
@yashmayya yashmayya added the window-functions Related to SQL window functions on the multi-stage query engine label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine release-notes Referenced by PRs that need attention when compiling the next release notes window-functions Related to SQL window functions on the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants