Skip to content

Conversation

@somandal
Copy link
Contributor

@somandal somandal commented Apr 18, 2023

This PR fixes an issue seen with certain types of window function queries using an empty OVER() clause. If the window function itself doesn't take any input columns (such as COUNT(*) or ROW_NUMBER()), the Apache Calcite rule CoreRules.PROJECT_WINDOW_TRANSPOSE creates an empty LogicalProject below the LogicalWindow and removes the LogicalProject above the window as it identifies the upper project as a trivial one.

The following types of queries fail because of the above Apache Calcite bug:

  • SELECT COUNT(*) OVER() from tableName
  • SELECT 42, COUNT(*) OVER() from tableName
  • SELECT ROW_NUMBER() OVER() from tableName [the PR to add support for this is still under review]

The fix this PR adds does the following:

  • Within the PinotWindowExchangeNodeInsertRule we detect the empty LogicalProject below LogicalWindow for empty OVER() type of queries.
  • If found:
    • Add a LogicalProject with a literal below the LogicalWindow
    • Modify the LogicalWindow's inputs to include the literal
    • Add the LogicalExchange (as before)
    • Add a LogicalProject above the LogicalWindow to remove the literal
  • Else just create the exchange below LogicalWindow like we always did

Discussed the above solution with @ankitsultana and @walterddr

cc @siddharthteotia @vvivekiyer

@walterddr walterddr self-requested a review April 18, 2023 18:07
@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Merging #10635 (be1d054) into master (4914947) will decrease coverage by 37.85%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##             master   #10635       +/-   ##
=============================================
- Coverage     70.27%   32.42%   -37.85%     
+ Complexity     6514      462     -6052     
=============================================
  Files          2108     2108               
  Lines        113904   113904               
  Branches      17192    17192               
=============================================
- Hits          80049    36938    -43111     
- Misses        28249    73706    +45457     
+ Partials       5606     3260     -2346     
Flag Coverage Δ
integration1 ?
integration2 23.95% <ø> (+0.01%) ⬆️
unittests1 ?
unittests2 13.84% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1324 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@vvivekiyer vvivekiyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@siddharthteotia siddharthteotia Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. MySQL throws error for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this works in PostgreSQL:

dvdrental=# select 42, COUNT(*) OVER() from payment;
 ?column? | count 
----------+-------
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596
       42 | 14596

There are many types of custom window frames that works in PostgreSQL but isn't allowed by Apache Calcite. I'll work on a document to list out the ones I know about. I assume that if we want to move towards PostgreSQL compliance, we want to allow these kinds of queries, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right. Let's get that document ready.

Copy link
Contributor

@siddharthteotia siddharthteotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions and clarifications. LGTM otherwise.

@siddharthteotia
Copy link
Contributor

@somandal - can you rebase please ?

@somandal somandal force-pushed the window-transpose-literal-project branch from 356e4c0 to 6928711 Compare April 24, 2023 16:33
@somandal
Copy link
Contributor Author

@somandal - can you rebase please ?

@siddharthteotia I've rebased, can you take a look when you get a chance? Thanks!

@siddharthteotia siddharthteotia merged commit a130cb2 into apache:master Apr 24, 2023
@somandal somandal deleted the window-transpose-literal-project branch April 24, 2023 19:58
@Jackie-Jiang Jackie-Jiang added multi-stage Related to the multi-stage query engine feature labels Apr 25, 2023
@yashmayya yashmayya added the window-functions Related to SQL window functions on the multi-stage query engine label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine window-functions Related to SQL window functions on the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants