Skip to content

Conversation

@somandal
Copy link
Contributor

@somandal somandal commented May 19, 2022

A vast majority of the code changes are in the tests.

The existing behavior for EXPLAIN PLAN has the following limitations:

  • The plan query is sent to only one random server with a random segment. This can have the following issues:
    • The segment may get pruned on the server side
    • The segment may produce an Empty filter during the operator tree creation
    • The segment may produce a Match All filter during the operator tree creation
    • AND and OR operators may result in one or both predicate subtrees getting degenerated into an Empty or Match All filter. Due to this the AND or OR subtree may be converted into a leaf level predicate. This leads to confusion in the user's mind regarding the output of the explain plan.
  • The overall approach is not a good representation of the distributed query planning for each segment

This PR address the above limitations by the following changes:

  • Send the plan query to all segments on all servers (the ones chosen by the Broker after broker side pruning)
    • Each server returns a set of deduplicated plans along with the count of number of segments matching each plan
    • Each server returns the following data as part of the metadata:
      • Number of segments pruned by the server side
      • Number of segments with an empty filter tree
      • Number of segments with a match all filter tree
    • On the Broker side the set of plans returned by each server is again deduplicated and the number of segments matching each plan is updated for each unique plan
    • Each broker returns the number of segments pruned by the broker side as part of the BrokerResponse
    • Adds a verbose explain plan option which returns all of the deduplicated plans.
    • If verbose is disabled (default option) then a single explain plan out of the deduplicated plan is returned with the deepest plan tree. This is a better approximation than the current explain plan functionality due to deduplication across servers and segments.

Example query (verbose enabled):

EXPLAIN PLAN FOR SELECT invertedIndexCol1, noIndexCol1 FROM testTable WHERE startsWith (textIndexCol1, 'daff') AND noIndexCol4 OPTION(explainPlanVerbose=true)

Here's an example of the Explain Plan output with verbose mode enabled:

BROKER_REDUCE
    COMBINE_SELECT
        PLAN_START(numSegmentsForThisPlan:3)
            SELECT(selectList:invertedIndexCol1, noIndexCol1)
                TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                     PROJECT(invertedIndexCol1, noIndexCol1)
                         DOC_ID_SET
                             FILTER_AND
                                 FILTER_FULL_SCAN(operator:EQ,predicate:noIndexCol4 = 'true')"
                                 FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 'true')
        PLAN_START(numSegmentsForThisPlan:1)
            SELECT(selectList:invertedIndexCol1, noIndexCol1)
                TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                     PROJECT(invertedIndexCol1, noIndexCol1)
                         DOC_ID_SET
                             FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 'true')

Example query (verbose disabled):

EXPLAIN PLAN FOR SELECT invertedIndexCol1, noIndexCol1 FROM testTable WHERE startsWith (textIndexCol1, 'daff') AND noIndexCol4

With verbose mode disabled, only the first plan will be selected as it has the deepest tree:

BROKER_REDUCE
    COMBINE_SELECT
        PLAN_START(numSegmentsForThisPlan:3)
            SELECT(selectList:invertedIndexCol1, noIndexCol1)
                TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                     PROJECT(invertedIndexCol1, noIndexCol1)
                         DOC_ID_SET
                             FILTER_AND
                                 FILTER_FULL_SCAN(operator:EQ,predicate:noIndexCol4 = 'true')"
                                 FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 'true')

cc @siddharthteotia

…rbose mode, b) deepest operator tree in normal mode based on some precedence ordering
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #8738 (9ef370f) into master (96a0291) will increase coverage by 0.05%.
The diff coverage is 87.61%.

@@             Coverage Diff              @@
##             master    #8738      +/-   ##
============================================
+ Coverage     69.65%   69.70%   +0.05%     
- Complexity     4612     4615       +3     
============================================
  Files          1730     1732       +2     
  Lines         90314    90596     +282     
  Branches      13421    13476      +55     
============================================
+ Hits          62904    63152     +248     
- Misses        23042    23050       +8     
- Partials       4368     4394      +26     
Flag Coverage Δ
integration1 27.02% <78.01%> (+0.16%) ⬆️
integration2 25.26% <74.92%> (-0.17%) ⬇️
unittests1 66.30% <85.47%> (+0.13%) ⬆️
unittests2 14.25% <4.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...inot/core/operator/filter/EmptyFilterOperator.java 100.00% <ø> (+25.00%) ⬆️
...t/core/operator/filter/MatchAllFilterOperator.java 100.00% <ø> (ø)
...va/org/apache/pinot/spi/utils/CommonConstants.java 21.31% <ø> (ø)
...g/apache/pinot/core/common/ExplainPlanRowData.java 62.50% <62.50%> (ø)
.../org/apache/pinot/core/common/ExplainPlanRows.java 82.85% <82.85%> (ø)
...core/query/executor/ServerQueryExecutorV1Impl.java 82.59% <85.10%> (+0.40%) ⬆️
...core/query/reduce/ExplainPlanDataTableReducer.java 84.13% <87.82%> (+9.13%) ⬆️
...roker/requesthandler/BaseBrokerRequestHandler.java 71.52% <100.00%> (+0.01%) ⬆️
...che/pinot/broker/routing/BrokerRoutingManager.java 85.31% <100.00%> (-0.08%) ⬇️
...ker/routing/instanceselector/InstanceSelector.java 100.00% <100.00%> (ø)
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96a0291...9ef370f. Read the comment docs.

@siddharthteotia
Copy link
Contributor

siddharthteotia commented May 19, 2022

@Jackie-Jiang / @richardstartin - please help take a look at the improvements here when you get a chance. I have offline worked with @somandal on the changes before putting out the PR.

@amrishlal
Copy link
Contributor

Since the EXPLAIN PLAN output now gives a much broader picture, I am wondering if it makes sense to delineate the steps between server, broker, and segment a bit more, i.e. replacing the COMBINE_XXX nodes with SERVER_COMBINE_XXX nodes and replacing PLAN_START with SEGMENT_PLAN (or similar)? That way it would be much easier to tell what happened on broker or server or at segment level?

@siddharthteotia
Copy link
Contributor

The issue #8607 should be fixed by this PR

@somandal
Copy link
Contributor Author

Since the EXPLAIN PLAN output now gives a much broader picture, I am wondering if it makes sense to delineate the steps between server, broker, and segment a bit more, i.e. replacing the COMBINE_XXX nodes with SERVER_COMBINE_XXX nodes and replacing PLAN_START with SEGMENT_PLAN (or similar)? That way it would be much easier to tell what happened on broker or server or at segment level?

@amrishlal this is a good idea, thanks for the suggestion! Discussed with @siddharthteotia and we decided to pick this up as a future enhancement (we plan to add additional explain nodes and other improvements). I'll create a ticket and a TODO for this.

@richardstartin
Copy link
Member

Rather than prefix the names for the different components, why not add another column with the component?

@richardstartin
Copy link
Member

Can you document how to toggle verbose mode?

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very useful, thank you

@somandal
Copy link
Contributor Author

Rather than prefix the names for the different components, why not add another column with the component?

Yup, that's a great idea. I'll pick this up as part of my other Explain plan improvements. Thanks for the suggestion!

@somandal
Copy link
Contributor Author

Can you document how to toggle verbose mode?

I've updated the comment to show the query option for verbose vs. non-verbose. Is there somewhere else this needs to be documented?

@siddharthteotia
Copy link
Contributor

@somandal we can update the Pinot user docs explaining the new behavior and how to use verbose mode. I can help you do that offline.

@siddharthteotia siddharthteotia merged commit 8219766 into apache:master May 21, 2022
@somandal somandal deleted the explain-plan-accuracy branch May 22, 2022 04:54
@siddharthteotia siddharthteotia added the release-notes Referenced by PRs that need attention when compiling the next release notes label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants