Skip to content

Conversation

@ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Jul 10, 2025

Note: Only touches physical optimizer code.

Summary

Enables the H2 comparison test in ResourceBasedQueriesTest and fixes the last set of bugs that I couldn't get to before. The bugs and changes are summarized below.

Fix Trait Assignment for SetOp

For SetOp, we need to pushdown a hash distribution trait with all fields. Except for UNION ALL, all other set operations require this hash dist trait.

UNION ALL doesn't need it, because it allows duplicates.

Fixes Minus

I was creating PhysicalUnion for a minus 🤦 .

Adds Intersect

Adds support for intersect.

Fixes Multi-Column Join

RelDistribution.hash(keys) normalizes the keys. That means that if you pass in keys=[1, 0, 4], it will get sorted and stored as [0, 1, 4].

When computing hashDistToMatch in WorkerExchangeAssignmentRule, I was using List to List comparison for the HashDistributionDesc and the RelDistribution.hash keys. Changed it to do set to set comparison.

In the future we can even change the keys to a set in HashDistributionDesc, or normalize the keys in HashDistributionDesc too.

Fixes Runtime Support for SetOp

I was only handling joins in meetParentEnforcedDistributionConstraint. Extended it to handle SetOp too.

Test Plan

Unit Tests have extensive coverage. There are around 200+ queries run in the new test I enabled in ResourceBasedQueriesTest, where the same query is run with H2 and the physical optimizer.

@ankitsultana ankitsultana added bugfix multi-stage Related to the multi-stage query engine mse-physical-optimizer labels Jul 10, 2025
@ankitsultana ankitsultana marked this pull request as ready for review July 11, 2025 04:38
@codecov-commenter
Copy link

codecov-commenter commented Jul 11, 2025

Codecov Report

Attention: Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 63.19%. Comparing base (1a476de) to head (b81ccb3).
Report is 442 commits behind head on master.

Files with missing lines Patch % Lines
...cal/v2/opt/rules/WorkerExchangeAssignmentRule.java 85.71% 0 Missing and 2 partials ⚠️
...ache/pinot/calcite/rel/traits/TraitAssignment.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16330      +/-   ##
============================================
+ Coverage     62.90%   63.19%   +0.28%     
+ Complexity     1386     1364      -22     
============================================
  Files          2867     2974     +107     
  Lines        163354   172591    +9237     
  Branches      24952    26435    +1483     
============================================
+ Hits         102755   109062    +6307     
- Misses        52847    55208    +2361     
- Partials       7752     8321     +569     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.15% <93.75%> (+0.28%) ⬆️
java-21 63.17% <93.75%> (+0.34%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.19% <93.75%> (+0.28%) ⬆️
unittests 63.18% <93.75%> (+0.28%) ⬆️
unittests1 56.41% <93.75%> (+0.59%) ⬆️
unittests2 33.18% <0.00%> (-0.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Override
public SetOp copy(RelTraitSet traitSet, List<RelNode> inputs, boolean all) {
return new PhysicalIntersect(
getCluster(), traitSet, getHints(), inputs.stream().map(PRelNode.class::cast).collect(Collectors.toList()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all RelNode castable to PRelNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, but our design is such that after RelNode optimization steps, we convert all nodes to PRelNode, and then only operate with them.

In other words, if a copy is called on a PRelNode, then it will only be when everything is a PRelNode.

@ankitsultana ankitsultana force-pushed the mse-physical-lm-tests branch from 6fe7df4 to b81ccb3 Compare July 11, 2025 18:35
@itschrispeck itschrispeck merged commit f444fc2 into apache:master Jul 11, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants