Skip to content

Conversation

@ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Jul 7, 2025

Summary

This PR fixes some assumptions in the v2 query optimizer, specifically about hash functions. With this PR, we achieve the following:

  1. We have an explicit list of supported hash functions defined in DistHashFunction. If a table uses some other hash function like modulo, then we consider the table-scan un-partitioned. We only allow those hash functions in table-scan which are also supported by KeySelector for hash exchange.
  2. PhysicalExchange tracks hash function separately to indicate the Hash Function to be used in KeySelector.

Before, we would simply assume that a hash function we choose during planning could be used at runtime for shuffling, but our shuffle runtime was always using absHashCode, which would have yielded incorrect results.

Test Plan

UTs have decent coverage. I am also doing some manual testing and figuring out how to add tests for this

@ankitsultana ankitsultana added multi-stage Related to the multi-stage query engine mse-physical-optimizer labels Jul 7, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2025

Codecov Report

Attention: Patch coverage is 78.18182% with 12 lines in your changes missing coverage. Please review.

Project coverage is 63.19%. Comparing base (1a476de) to head (6e2aedd).
Report is 417 commits behind head on master.

Files with missing lines Patch % Lines
...cal/v2/opt/rules/WorkerExchangeAssignmentRule.java 63.63% 8 Missing ⚠️
...he/pinot/query/context/PhysicalPlannerContext.java 75.00% 1 Missing ⚠️
...y/planner/physical/v2/PRelToPlanNodeConverter.java 0.00% 1 Missing ⚠️
...ry/planner/physical/v2/nodes/PhysicalExchange.java 75.00% 1 Missing ⚠️
...al/v2/opt/rules/LeafStageWorkerAssignmentRule.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16296      +/-   ##
============================================
+ Coverage     62.90%   63.19%   +0.29%     
+ Complexity     1386     1364      -22     
============================================
  Files          2867     2966      +99     
  Lines        163354   171673    +8319     
  Branches      24952    26296    +1344     
============================================
+ Hits         102755   108497    +5742     
- Misses        52847    54919    +2072     
- Partials       7752     8257     +505     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.17% <78.18%> (+0.30%) ⬆️
java-21 63.18% <78.18%> (+0.35%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.19% <78.18%> (+0.29%) ⬆️
unittests 63.19% <78.18%> (+0.29%) ⬆️
unittests1 56.33% <78.18%> (+0.51%) ⬆️
unittests2 33.37% <0.00%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ankitsultana ankitsultana requested a review from itschrispeck July 8, 2025 00:46
@ankitsultana ankitsultana changed the title [multistage] Support Different Hash Functions Gracefully in V2 Optimizer [multistage] Support Hash Functions Gracefully in V2 Optimizer Jul 8, 2025
@ankitsultana ankitsultana marked this pull request as ready for review July 8, 2025 00:47
@ankitsultana ankitsultana merged commit a02253d into apache:master Jul 8, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants