Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Aug 21, 2023

To protect the server from OOM, this PR introduces the limit for join operator to ensure the memory consumption is limited.

  1. Add new query options for join operator right table size limit:
  • joinOverflowMode(string) : Overflow mode, allowed value are: THROW/BREAK
    • Server config: pinot.query.join.overflow.mode=THROW
    • Session level: SET joinOverflowMode='THROW'
    • QueryHint level: join_overflow_mode='THROW'
  • maxRowsInJoin(integer) : the number of rows allowed for right table cache in a join operator
    • Server config: pinot.query.join.max.rows=100000
    • Session level: SET maxRowsInJoin=1000000
    • QueryHint level: max_rows_in_join='1000000'
  1. Add server partial result exception
  2. Support continuous processing when joinOverflowMode = BREAK;

TODO: Support size based limit.

Examples:

SET joinOverflowMode='BREAK';
SET maxRowsInJoin=10;
SELECT a.playerName, a.teamID, b.teamName 
FROM baseballStats_OFFLINE AS a
JOIN dimBaseballTeams_OFFLINE AS b
ON a.teamID = b.teamID
SELECT /*+ joinOptions(join_overflow_mode='BREAK', max_rows_in_join='10') */ a.playerName, a.teamID, b.teamName 
FROM baseballStats_OFFLINE AS a
JOIN dimBaseballTeams_OFFLINE AS b
ON a.teamID = b.teamID

Sample screenshot for:
0. Normal run without limit hit: (Joined result is 47735)
image

  1. Query option: Throw exception when row limit hit:
image 2. Query option: Continue when row limit hit: (Joined result is 29884) image
  1. JoinHints: Throw exception when row limit hit:
image
  1. JoinHints: Continue when row limit hit: (Joined result is 29884)
image

More examples of difference between the threshold, and overflow mode:
image
image
image

@xiangfu0 xiangfu0 marked this pull request as draft August 21, 2023 22:10
@xiangfu0 xiangfu0 force-pushed the join-exceptions branch 2 times, most recently from 17e4ab3 to c2e4689 Compare August 22, 2023 06:09
@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2023

Codecov Report

❌ Patch coverage is 63.51351% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.91%. Comparing base (32e47c7) to head (5c36a12).
⚠️ Report is 3798 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/query/runtime/QueryRunner.java 31.57% 8 Missing and 5 partials ⚠️
...pinot/query/runtime/operator/HashJoinOperator.java 71.11% 4 Missing and 9 partials ⚠️
...e/pinot/common/utils/config/QueryOptionsUtils.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #11401      +/-   ##
============================================
+ Coverage     62.89%   62.91%   +0.02%     
- Complexity     1079     1091      +12     
============================================
  Files          2301     2301              
  Lines        123699   123766      +67     
  Branches      18816    18833      +17     
============================================
+ Hits          77803    77873      +70     
+ Misses        40367    40355      -12     
- Partials       5529     5538       +9     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 62.89% <63.51%> (+0.01%) ⬆️
java-17 62.76% <63.51%> (+0.02%) ⬆️
java-20 49.77% <63.51%> (+0.02%) ⬆️
temurin 62.91% <63.51%> (+0.02%) ⬆️
unittests 62.91% <63.51%> (+0.02%) ⬆️
unittests1 67.45% <63.51%> (+0.03%) ⬆️
unittests2 14.50% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added the multi-stage Related to the multi-stage query engine label Aug 22, 2023
@xiangfu0 xiangfu0 marked this pull request as ready for review August 22, 2023 08:49
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang August 22, 2023 08:49
@xiangfu0 xiangfu0 force-pushed the join-exceptions branch 5 times, most recently from 5c39b24 to 76d463c Compare August 22, 2023 17:59
@xiangfu0 xiangfu0 requested a review from walterddr August 22, 2023 18:00
@Jackie-Jiang Jackie-Jiang added Configuration Config changes (addition/deletion/change in behavior) release-notes Referenced by PRs that need attention when compiling the next release notes labels Aug 22, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if we read exception from the data block and EOS block? I feel they are just ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are ignored right now. Need to open up another thread to fix this.

@xiangfu0 xiangfu0 force-pushed the join-exceptions branch 6 times, most recently from 54b4ea6 to 0c2d050 Compare August 22, 2023 21:45
@xiangfu0 xiangfu0 merged commit e0d0aeb into apache:master Aug 22, 2023
@xiangfu0 xiangfu0 deleted the join-exceptions branch August 22, 2023 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Configuration Config changes (addition/deletion/change in behavior) feature multi-stage Related to the multi-stage query engine release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants