Skip to content

Conversation

@ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Mar 25, 2025

Summary

Adds basic constructs required for Physical Optimizers as specified in this doc: https://docs.google.com/document/d/17ApZbvNphKgEdSAOlZwTwAnnL_dAt9QbjcpzjHb4M0w/edit?tab=t.0

This is largely based on the PoC I had raised a few weeks back, with some changes: #15282

Changes wrt PoC

PRelNode Interface

PRelNode is no longer a single concrete type (class) and a wrapper on top of RelNode. Instead I have modeled it as a separate interface. This not only keeps us somewhat compatible with Calcite, it also will also cleanup the code quite a bit.

The main con of this approach is that we'll have to define our own Sort, Window, and other nodes. But that is what Calcite recommends, as @gortiz has also shared before.

To show what this looks like, this PR has added a couple of custom Physical plan nodes (exchange, filter and table scan).

Implementation Related Changes in RuleExecutor

RuleExecutor implements the PRelNodeTransformer. The transformer is a general concept, and ideally Physical Planner should run a series of transformers, some of which may be executed via RuleExecutor.

Why? This is because for some of the optimizations we may want complete control on the flow and constructs like tracking parent plan nodes, PRelOptRuleCall etc. may not be necessary. Moreover, the onMatch / matches contract might not fit the requirements well.

This is nothing new. Calcite has the concept of Program, Presto has PlanOptimizer, etc.

Test Plan / Rollout Safety

This is completely isolated code right now, and we'll keep it behind a flag for a few weeks until we test it in our clusters.

@codecov-commenter
Copy link

codecov-commenter commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 31.75966% with 159 lines in your changes missing coverage. Please review.

Project coverage is 62.89%. Comparing base (59551e4) to head (284101f).
Report is 1979 commits behind head on master.

Files with missing lines Patch % Lines
...ery/planner/physical/v2/PinotDataDistribution.java 43.63% 23 Missing and 8 partials ⚠️
...he/pinot/query/context/PhysicalPlannerContext.java 0.00% 25 Missing ⚠️
...uery/planner/physical/v2/HashDistributionDesc.java 26.92% 19 Missing ⚠️
...ry/planner/physical/v2/nodes/PhysicalExchange.java 0.00% 18 Missing ⚠️
...y/planner/physical/v2/nodes/PhysicalTableScan.java 0.00% 18 Missing ⚠️
...uery/planner/physical/v2/nodes/PhysicalFilter.java 0.00% 15 Missing ⚠️
...t/query/planner/physical/v2/TableScanMetadata.java 0.00% 12 Missing ⚠️
...ot/query/planner/physical/v2/ExchangeStrategy.java 0.00% 8 Missing ⚠️
...ot/query/planner/physical/v2/opt/RuleExecutor.java 65.00% 4 Missing and 3 partials ⚠️
...ache/pinot/query/planner/physical/v2/PRelNode.java 20.00% 4 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15371      +/-   ##
============================================
+ Coverage     61.75%   62.89%   +1.13%     
- Complexity      207     1375    +1168     
============================================
  Files          2436     2834     +398     
  Lines        133233   160078   +26845     
  Branches      20636    24542    +3906     
============================================
+ Hits          82274   100675   +18401     
- Misses        44911    51810    +6899     
- Partials       6048     7593    +1545     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 62.83% <31.75%> (+1.12%) ⬆️
java-21 62.87% <31.75%> (+1.24%) ⬆️
skip-bytebuffers-false 62.88% <31.75%> (+1.13%) ⬆️
skip-bytebuffers-true 62.84% <31.75%> (+35.11%) ⬆️
temurin 62.89% <31.75%> (+1.13%) ⬆️
unittests 62.88% <31.75%> (+1.14%) ⬆️
unittests1 55.88% <31.75%> (+8.99%) ⬆️
unittests2 33.66% <0.00%> (+5.93%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ankitsultana ankitsultana marked this pull request as ready for review March 27, 2025 00:48
@ankitsultana ankitsultana changed the title [WIP] Adding Basic Constructs for Enabling Physical Optimizers [multistage] Adding Basic Constructs for Physical Optimization Mar 27, 2025
*/
public class TableScanMetadata {
private final Set<String> _scannedTables;
private final Map<Integer, Map<String, List<String>>> _workedIdToSegmentsMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the value here? Map<String, List<String>? workerId => table type => segments?

Could we use a class or add a comment so we don't have to track the initialization back to understand the structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a javadoc comment.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in high level idea. We can revisit the details in the following PRs

@ankitsultana ankitsultana merged commit 9db45e3 into apache:master Apr 2, 2025
22 checks passed
leujean02 pushed a commit to leujean02/pinot that referenced this pull request Apr 3, 2025
leujean02 pushed a commit to leujean02/pinot that referenced this pull request Apr 4, 2025
@ankitsultana ankitsultana added the multi-stage Related to the multi-stage query engine label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants