Multi stage explain #13733

gortiz · 2024-08-01T15:12:06Z

This PR introduces a new way to explain multi-stage queries in Pinot.
The main goal is to provide a more detailed explanation of the query execution plan, including information about
the physical operators that are being used.

Warning

Edit on Jan 2025: The next paragraph is incorrect. The new explain plan is disabled in 1.3.0 and we plan to enable it by default in future versions. Please refer to Pinot explain documentation to see the different versions and how they could be enabled.

By default, explain plan for will return the new plan. If you want to use the old plan you can use
explain plan without implementation for. This may be problematic, so we can discuss to introduce a new flag for this.
The main reason to break the default behavior is that the new plan is more verbose actually what a user should expect
when asking for implementation, at least following Calcite terminology.
Alternatively we can change the syntax in the same way we already did with explain physical plan for.

At architectural level, the new explain mode is closer to the one used in single stage.
The broker parses and optimizes the query generating a logical plan, generating RelNodes.
These nodes are transformed into PlanNodes as usual and sent to the servers.
But instead of asking to execute the plan, the broker asks to explain it using a new protobuf endpoint.
This new endpoint returns a list of PlanNodes.

When the server receives the explain request, it analyzes the plan looking for leaf operators and creates single-stage
operators as usual.
There are two key differences with respect of the execution mode:

The server tracks which PlanNodes have been converted into single-stage operators.
The server does not execute the operator. Instead it calls a new introduced method Operator.getOperatorInfo,
which returns the same information returned by Operator.explainPlan but in POJOs.

The server then convert these POJOs into PlanNodes and substitute the tracked PlanNodes with the new ones.
Finally the new plan is sent back to the broker.

In order to be able to introduce physical (aka index used, etc) information in the PlanNode, a new ExplainedPlanNode is
created.
These nodes are not meant to be translated into actual operators, but to be used to explain the query execution plan.
When the broker receives the PlanNodes, it converts them back into a RelNode using a new class PinotExplainedRelNode.

Then the broker substitutes the original logical RelNodes with the new ones returned by the servers.
Finally, it explains the RelNode as expected in Calcite.

The result can be see in the following pictures:

Without implementation (similar to current explain)

With implementation:

The PR is still a work in progress, but it is already partially functional.

codecov-commenter · 2024-08-05T10:48:36Z

Codecov Report

Attention: Patch coverage is 17.52161% with 1431 lines in your changes missing coverage. Please review.

Project coverage is 64.20%. Comparing base (59551e4) to head (6641543).
Report is 1081 commits behind head on master.

Files with missing lines	Patch %	Lines
...he/pinot/query/planner/explain/PlanNodeMerger.java	0.00%	251 Missing ⚠️
.../query/planner/logical/PlanNodeToRelConverter.java	0.00%	239 Missing ⚠️
...he/pinot/query/planner/explain/PlanNodeSorter.java	0.00%	79 Missing ⚠️
...inot/query/planner/logical/RexExpressionUtils.java	1.36%	72 Missing ⚠️
...apache/pinot/query/service/server/QueryServer.java	43.01%	51 Missing and 2 partials ⚠️
...t/query/planner/explain/ExplainNodeSimplifier.java	0.00%	49 Missing ⚠️
...ry/planner/explain/AskingServerStageExplainer.java	0.00%	47 Missing ⚠️
.../apache/pinot/core/plan/PinotExplainedRelNode.java	0.00%	43 Missing ⚠️
...e/operator/LeafStageTransferableBlockOperator.java	0.00%	38 Missing ⚠️
...va/org/apache/pinot/query/runtime/QueryRunner.java	2.63%	37 Missing ⚠️
... and 74 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13733      +/-   ##
============================================
+ Coverage     61.75%   64.20%   +2.44%     
- Complexity      207     1534    +1327     
============================================
  Files          2436     2594     +158     
  Lines        133233   142748    +9515     
  Branches      20636    21864    +1228     
============================================
+ Hits          82274    91646    +9372     
+ Misses        44911    44349     -562     
- Partials       6048     6753     +705

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`64.18% <17.52%> (+2.47%)`	⬆️
java-21	`64.07% <17.52%> (+2.45%)`	⬆️
skip-bytebuffers-false	`64.19% <17.52%> (+2.45%)`	⬆️
skip-bytebuffers-true	`64.05% <17.52%> (+36.32%)`	⬆️
temurin	`64.20% <17.52%> (+2.44%)`	⬆️
unittests	`64.19% <17.52%> (+2.44%)`	⬆️
unittests1	`55.62% <17.59%> (+8.73%)`	⬆️
unittests2	`34.55% <1.84%> (+6.81%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ility

Add a flag we can use to decide if we want to use the new plan or the old one by default

…xplainAskingServersUtils

…servers

…stHandler cannot obtain the physical plan

…apache.pinot.query.planner.explain package

…x send It was an error introduced in 2ec071c

yashmayya

@gortiz something I just realized is that we can lose the table related information in the new explain plan. For instance, this query on the basic quickstart:

EXPLAIN PLAN WITHOUT IMPLEMENTATION FOR WITH tmp AS (
  select playerID,
    teamID,
    SUM(homeRuns) as totalHomeRuns
  from baseballStats
  WHERE yearID > 2000
  GROUP BY playerID,
    teamID
  ORDER BY totalHomeRuns DESC
)
SELECT *
FROM tmp
  JOIN dimBaseballTeams ON tmp.teamID = dimBaseballTeams.teamID;

returns:

Execution Plan
LogicalJoin(condition=[=($1, $3)], joinType=[inner])
  PinotLogicalExchange(distribution=[hash[1]])
    PinotLogicalAggregate(group=[{0, 1}], agg#0=[$SUM0($2)])
      PinotLogicalExchange(distribution=[hash[0, 1]])
        LeafStageCombineOperator
          StreamingInstanceResponse
            CombineGroupBy
              GroupBy(groupKeys=[[playerID, teamID]], aggregations=[[sum(homeRuns)]])
                Project(columns=[[homeRuns, teamID, playerID]])
                  DocIdSet(maxDocs=[10000])
                    FilterFullScan(predicate=[yearID > '2000'], operator=[RANGE])
  PinotLogicalExchange(distribution=[hash[0]])
    LeafStageCombineOperator
      StreamingInstanceResponse
        StreamingCombineSelect
          SelectStreaming(segment=[dimBaseballTeams_OFFLINE_0], table=[dimBaseballTeams], totalDocs=[51])
            Project(columns=[[teamName, teamID]])
              DocIdSet(maxDocs=[10000])
                FilterMatchEntireSegment(numDocs=[51])

whereas earlier it would've returned:

Execution Plan
LogicalJoin(condition=[=($1, $3)], joinType=[inner])
  PinotLogicalExchange(distribution=[hash[1]])
    PinotLogicalAggregate(group=[{0, 1}], agg#0=[$SUM0($2)])
      PinotLogicalExchange(distribution=[hash[0, 1]])
        PinotLogicalAggregate(group=[{16, 25}], agg#0=[$SUM0($11)])
          LogicalFilter(condition=[>($27, 2000)])
            LogicalTableScan(table=[[default, baseballStats]])
  PinotLogicalExchange(distribution=[hash[0]])
    LogicalProject(teamID=[$3], teamName=[$4])
      LogicalTableScan(table=[[default, dimBaseballTeams]])

For simpler queries, this probably won't be a big issue but for complex queries with lots of joins, CTEs etc. I think it's pretty important to include this information.

Looks like you updated StreamingSelectionOnlyOperator to include the table name in its explain attributes; I guess we should do something similar for all other such operators that can be used in the leaf stage (for instance, StreamingInstanceResponseOperator)?

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

yashmayya · 2024-09-13T09:51:45Z

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

      public static final int V1 = 1;
    }
+
+    public static final String ASK_SERVERS_FOR_EXPLAIN_PLAN = "pinot.query.explain.ask.servers";


Agreed on usage of physical, what about something like pinot.multistage.explain.query.include.segment.level.plan? It's fairly verbose but hopefully should be able to convey intent clearly to users.

yashmayya · 2024-09-13T09:54:22Z

...-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultiStageOperator.java

+  protected List<ExplainInfo> getChildrenExplainInfo() {
+    return getChildOperators().stream()
+        .filter(Objects::nonNull)
+        .map(Operator::getExplainInfo)
+        .collect(Collectors.toList());
+  }
+
+  protected String getExplainName() {
+    return toExplainString();
+  }
+
+  protected Map<String, Plan.ExplainNode.AttributeValue> getExplainAttributes() {
+    return Collections.emptyMap();
+  }


Hm, agree on the current state of the Operator interface. Thanks for adding Javadocs to all the explain related methods - that should help out quite a bit. I think we can discuss potential refactoring of that interface separately, this looks good for now.

yashmayya · 2024-09-13T10:31:03Z

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

            explain.getDetailLevel() == null ? SqlExplainLevel.DIGEST_ATTRIBUTES : explain.getDetailLevel();
        Set<String> tableNames = RelToPlanNodeConverter.getTableNamesFromRelRoot(relRoot.rel);
-        return new QueryPlannerResult(null, PlannerUtils.explainPlan(relRoot.rel, format, level), tableNames);
+        if (!explain.withImplementation() || !askServers) {


Sounds good, we can also plan to deprecate the EXPLAIN IMPLEMENTATION PLAN FOR syntax and change it to one of the WITH <extension> in the future.

yashmayya · 2024-09-13T11:23:25Z

pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/DispatchClient.java

+
+  public void explain(Worker.QueryRequest request, QueryServerInstance virtualServer, Deadline deadline,
+      Consumer<AsyncResponse<List<Worker.ExplainResponse>>> callback) {
+    _dispatchStub.withDeadline(deadline).explain(request, new AllValuesDispatchObserver<>(virtualServer, callback));


What I don't get is why the proto file is defined as:

rpc Submit(ServerRequest) returns (stream ServerResponse);
instead of

rpc Submit(ServerRequest) returns (ServerResponse);

That's the proto definition for GrpcQueryServer right (which is for v1 streaming queries FWICT)? The proto definition for the multi-stage engine's QueryServer Submit RPC is -

pinot/pinot-common/src/main/proto/worker.proto

Line 26 in de577bc

rpc Submit(QueryRequest) returns (QueryResponse);

So I guess this makes sense now since the new Explain RPC returns a stream ExplainResponse and the implementation does call onNext multiple times.

yashmayya · 2024-09-13T11:30:41Z

...ery-planner/src/main/java/org/apache/pinot/query/planner/logical/PlanNodeToRelConverter.java

+    return visitor.build();
+  }
+
+  private static class ConverterVisitor implements PlanNodeVisitor<Void, Void> {


Makes sense, thanks for elaborating! 😄

yashmayya · 2024-09-13T11:40:25Z

pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java

+    if (PipelineBreakerExecutor.hasPipelineBreakers(stagePlan)) {
+      // TODO: Support pipeline breakers before merging this feature.
+      LOGGER.error("Pipeline breaker is not supported in explain query");
+      return stagePlan;
+    }


Ah, I hadn't thought of that either, thanks for the explanation! I guess we can just update that TODO comment for now, it makes sense to defer this considering the current explain also doesn't properly support pipeline breaker.

# Conflicts: # pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/DispatchClient.java # pinot-query-runtime/src/main/java/org/apache/pinot/query/service/server/QueryServer.java

… operator

gortiz · 2024-09-18T14:00:13Z

something I just realized is that we can lose the table related information in the new explain plan. For instance, this query on the basic quickstart:
Looks like you updated StreamingSelectionOnlyOperator to include the table name in its explain attributes; I guess we should do something similar for all other such operators that can be used in the leaf stage (for instance, StreamingInstanceResponseOperator)?

Nice catch. Yes, I've found the same problem and tried to solve it that way, but it doesn't seem to be a scalable solution. Very easily we can end up having an operator that is not registering the table. Instead what I've done is to add the table in the LeafStageTransferableBlockOperator, which by definition knows the table

…as pinot.query.multistage.explain.include.segment.plan

…egment.plan by default

…ableName to table

yashmayya

Thanks @gortiz, LGTM! There are some new tests added in #13999 failing with an NPE here though (and looks related to the explain changes).

yashmayya · 2024-09-23T11:39:21Z

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/explain/PlanNodeSorter.java

+          return cmp2;
+        }
+        int cmp3;
+        switch (value1.getValueCase()) {


Do we not need to handle the STRINGLIST case here?

Probably because either I missed it or because JSON (the previous type) is difficult to sort. Adding it now that it is just a list of strings

Changed in 2722631

...rc/main/java/org/apache/pinot/query/runtime/operator/LeafStageTransferableBlockOperator.java

gortiz · 2024-09-23T12:47:03Z

Thanks @gortiz, LGTM! There are some new tests added in #13999 failing with an NPE here though (and looks related to the explain changes).

Yes, I've fixed that, but now the same tests fail due to some assertion in the test I don't understand. That is why we need to merge this ASAP. The code in ServerQueryExecutorV1Impl is very sensible and it has been modified more often that it used due to the timeseries code

gortiz · 2024-09-23T12:57:40Z

The issue should be fixed. Anyway, my error merging this code shown that #13999 is not testing the empty case and also to me it looks like the code that was added in ServerQueryExecutorV1Impl in that PR should have been added into ResultsBlockUtils.buildEmptyQueryResults. @ankitsultana can you take a look at that?

…table

yashmayya · 2024-10-01T04:21:48Z

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

+         *
+         * Use false in order to mimic behavior of Pinot 1.2.0 and previous.
+         */
+        public static final String EXPLAIN_ASKING_SERVERS = "explainAskingServers";


I think we should update the query option for consistency since we updated the broker config from pinot.query.explain.ask.servers to pinot.query.multistage.explain.include.segment.plan. Probably to explainIncludeSegmentPlan?

yashmayya · 2024-10-01T04:22:23Z

...ker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java


  private final WorkerManager _workerManager;
  private final QueryDispatcher _queryDispatcher;
+  private final boolean _explainAskingServerDefault;


nit: let's update this member variable name too.

yashmayya · 2024-10-09T05:47:08Z

#14193

gortiz force-pushed the multi-stage-explain branch from 8722ab7 to a3b3874 Compare August 5, 2024 10:01

gortiz requested review from Jackie-Jiang and yashmayya August 5, 2024 10:04

yashmayya added multi-stage Related to the multi-stage query engine feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Aug 5, 2024

gortiz force-pushed the multi-stage-explain branch from 5725a84 to 67c8e5c Compare August 7, 2024 14:37

gortiz added 20 commits August 13, 2024 12:12

Split QueryDispatcher.submit into different methods to improve readab…

f5f200c

…ility

multi-stage-explain: First commit

9a063bc

multi-stage-explain: Add a flag to ask servers

5ed1a2a

Add a flag we can use to decide if we want to use the new plan or the old one by default

Update headers and imports

a259be3

multi-stage-explain: Fix flag

bdadc90

multi-stage-explain: Include all plans

68364a4

multi-stage-explain: Improve some v1 operator messages in v2

9960511

multi-stage-explain: Rename ImplementationExplainUtils as MultiStageE…

28af584

…xplainAskingServersUtils

multi-stage-explain: support window and setop

a704bbe

multi-stage-explain: support MailboxReceive and MailboxSend

d2e2689

multi-stage-explain: Combine different plans from different segments/…

0ded26e

…servers

multi-stage-explain: Support SET explainPlanVerbose=true;

427e57c

multi-stage-explain: Simplify QueryServer

f14a954

multi-stage-explain: Improve error message when MultiStageBrokerReque…

fd62169

…stHandler cannot obtain the physical plan

multi-stage-explain: simplify git diff in ServerQueryExecutorV1Impl

8a3821e

multi-stage-explain: add javadoc to TransformationTracker

f756aa1

multi-stage-explain: Move MultiStageExplainAskingServersUtils to org.…

df386f8

…apache.pinot.query.planner.explain package

multi-stage-explain: Rename PinotExplainedRelNode.Info as ExplainInfo

267bc6c

multi-stage-explain: Fix PinotLogicalQueryPlanner not tracking mailbo…

532a2cf

…x send It was an error introduced in 2ec071c

multi-stage-explain: Add license header

eca86d3

gortiz force-pushed the multi-stage-explain branch from d834ba9 to eca86d3 Compare August 13, 2024 10:12

multi-stage-explain: Simplify explain nodes when not using verbose mode

edc02da

gortiz added 3 commits September 11, 2024 12:30

multi-stage-explain: Remove unused parameters

2bf176b

multi-stage-explain: Improve error message

d367d0f

multi-stage-explain: Fixed expected explains

35d072e

yashmayya reviewed Sep 13, 2024

View reviewed changes

gortiz added 4 commits September 18, 2024 08:43

multi-stage-explain: remove TODO

05aedf5

Merge remote-tracking branch 'origin/master' into multi-stage-explain

ef8f448

# Conflicts: # pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/DispatchClient.java # pinot-query-runtime/src/main/java/org/apache/pinot/query/service/server/QueryServer.java

Merge remote-tracking branch 'origin/master' into multi-stage-explain

312091e

multi-stage-explain: Register table name when explaining a leaf stage…

212adb1

… operator

gortiz added 5 commits September 18, 2024 16:07

multi-stage-explain: Rename property pinot.query.explain.ask.servers …

d101cba

…as pinot.query.multistage.explain.include.segment.plan

multi-stage-explain: Disable pinot.query.multistage.explain.include.s…

b8134e0

…egment.plan by default

multi-stage-explain: Rename leaf stage table explain attribute from t…

28b85df

…ableName to table

multi-stage-explain: Update test to leaf operator including table name

23c6b2c

Merge remote-tracking branch 'master' into multi-stage-explain

4fc70cd

gortiz force-pushed the multi-stage-explain branch from 8bea4a1 to 4fc70cd Compare September 23, 2024 10:44

yashmayya approved these changes Sep 23, 2024

View reviewed changes

Fix errors when merging changes in apache#13999

0b7b263

gortiz added 2 commits September 23, 2024 15:04

multi-stage-explain: Add a comparator for lists of strings

2722631

multi-stage-explain: Remove unnecessary merge type for leaf operator …

68a122a

…table

gortiz mentioned this pull request Sep 24, 2024

Part-1: Pinot Timeseries Engine SPI #13885

Merged

Merge remote-tracking branch 'origin/master' into multi-stage-explain

6641543

Jackie-Jiang approved these changes Sep 24, 2024

View reviewed changes

gortiz merged commit c484fef into apache:master Sep 24, 2024

gortiz deleted the multi-stage-explain branch September 24, 2024 17:37

yashmayya reviewed Oct 1, 2024

View reviewed changes

yashmayya mentioned this pull request Oct 9, 2024

Rename multi-stage engine's explain asking servers to explain include segment plan #14193

Closed

gortiz mentioned this pull request Oct 9, 2024

Fix Bug in Handling Empty Filters in Time Series + Minor Fixes #14192

Merged

Multi stage explain #13733

Multi stage explain #13733

Uh oh!

Conversation

gortiz commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yashmayya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz commented Sep 18, 2024

Uh oh!

yashmayya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gortiz commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gortiz commented Sep 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashmayya commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gortiz commented Aug 1, 2024 •

edited

Loading

codecov-commenter commented Aug 5, 2024 •

edited

Loading

gortiz commented Sep 23, 2024 •

edited

Loading