Skip to content

Conversation

@bziobrowski
Copy link
Contributor

@bziobrowski bziobrowski commented Dec 30, 2024

PR adds following to MSQE engine:

  • group_trim_size hint - that enables trimming at aggregate operator stage if both order by and limit are available (currently requires using is_enable_group_trim hint). Note: is_enable_group_trim also enables v1-style leaf-stage group by results trimming. See grouping algorithm documentation for details.
  • error_or_num_groups_limit hint or errorOnNumGroupsLimit query option - throws exception when num_groups_limit is reached in aggregate operator instead of setting a metadata flag

Examples:

  • enable group by trimming in MSQE intermediate stage:
    Query:
select /*+  aggOptions(is_enable_group_trim='true',num_groups_limit='50') */ i, j, count(*) as cnt
from tab
group by i, j
order by i, j desc
limit 5

Execution plan:

LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], offset=[0], fetch=[5])
       PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1 DESC]], ...)
           LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], fetch=[5])                  
             PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL]...) <-- trimming happens here
               PinotLogicalExchange(distribution=[hash[0, 1]])
                 LeafStageCombineOperator(table=[mytable])
                   StreamingInstanceResponse
                     CombineGroupBy
                       GroupBy(groupKeys=[[i, j]], aggregations=[[count(*)]])
                         Project(columns=[[i, j]])
                           DocIdSet(maxDocs=[40000])
                             FilterMatchEntireSegment(numDocs=[80])
  • enable group by trimming in MSQE leaf and intermediate stage:
    Query:
select /*+  aggOptions(is_enable_group_trim='true',group_trim_size='3') */ t1.i, t1.j, count(*) as cnt
 from tab t1
 join tab t2 on 1=1
 group by t1.i, t1.j
 order by t1.i asc, t1.j asc
 limit 5

Execution plan:

Execution plan:
LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], offset=[0], fetch=[5])
  PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1]], isSortOnSender=[false], "
isSortOnReceiver=[true])
    LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[5])
      PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL], ...) <-- trimming happens here
        PinotLogicalExchange(distribution=[hash[0, 1]])
          PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT()], aggType=[LEAF], ...) <-- trimming happens here
            LogicalJoin(condition=[true], joinType=[inner])
              PinotLogicalExchange(distribution=[random])
                LeafStageCombineOperator(table=[mytable])
                  StreamingInstanceResponse
                    StreamingCombineSelect
                      SelectStreaming(table=[mytable], totalDocs=[80])
                        Project(columns=[[i, j]])
                          DocIdSet(maxDocs=[40000])
                            FilterMatchEntireSegment(numDocs=[80])
              PinotLogicalExchange(distribution=[broadcast])
                LeafStageCombineOperator(table=[mytable])
                  StreamingInstanceResponse
                    StreamingCombineSelect
                      SelectStreaming(table=[mytable], totalDocs=[80])
                        Transform(expressions=[['0']])
                          Project(columns=[[]])
                            DocIdSet(maxDocs=[40000])
                              FilterMatchEntireSegment(numDocs=[80])

cc @Jackie-Jiang @gortiz

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 59.18367% with 60 lines in your changes missing coverage. Please review.

Project coverage is 63.89%. Comparing base (59551e4) to head (94f20d6).
Report is 1570 commits behind head on master.

Files with missing lines Patch % Lines
...inot/controller/helix/ControllerRequestClient.java 0.00% 22 Missing ⚠️
...ry/runtime/operator/MultistageGroupByExecutor.java 52.63% 17 Missing and 1 partial ⚠️
...inot/query/runtime/operator/AggregateOperator.java 64.10% 6 Missing and 8 partials ⚠️
...va/org/apache/pinot/query/runtime/QueryRunner.java 42.85% 1 Missing and 3 partials ⚠️
.../pinot/query/service/dispatch/QueryDispatcher.java 92.85% 0 Missing and 1 partial ⚠️
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14727      +/-   ##
============================================
+ Coverage     61.75%   63.89%   +2.14%     
- Complexity      207     1612    +1405     
============================================
  Files          2436     2704     +268     
  Lines        133233   151088   +17855     
  Branches      20636    23342    +2706     
============================================
+ Hits          82274    96537   +14263     
- Misses        44911    47323    +2412     
- Partials       6048     7228    +1180     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.84% <59.18%> (+2.13%) ⬆️
java-21 63.75% <59.18%> (+2.12%) ⬆️
skip-bytebuffers-false 63.89% <59.18%> (+2.14%) ⬆️
skip-bytebuffers-true 63.70% <59.18%> (+35.97%) ⬆️
temurin 63.89% <59.18%> (+2.14%) ⬆️
unittests 63.89% <59.18%> (+2.14%) ⬆️
unittests1 56.31% <69.60%> (+9.42%) ⬆️
unittests2 34.15% <10.88%> (+6.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# Conflicts:
#	pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/logical/PinotLogicalAggregate.java
#	pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java
#	pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java
#	pinot-query-planner/src/test/resources/queries/GroupByPlans.json
#	pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestVisitor.java
#	pinot-query-runtime/src/test/resources/queries/QueryHints.json
@gortiz gortiz merged commit b6904da into apache:master Jan 14, 2025
21 checks passed
zeronerdzerogeekzerocool pushed a commit to zeronerdzerogeekzerocool/pinot that referenced this pull request Feb 20, 2025
* group_trim_size hint - that enables trimming at aggregate operator stage if both order by and limit are available (currently requires using is_enable_group_trim hint). Note: is_enable_group_trim also enables v1-style leaf-stage group by results trimming. See [grouping algorithm documentation](https://docs.pinot.apache.org/users/user-guide-query/query-syntax/grouping-algorithm) for details.
* error_or_num_groups_limit hint or errorOnNumGroupsLimit query option - throws exception when num_groups_limit is reached in aggregate operator instead of setting a metadata flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants