Skip to content

Conversation

@ankitsultana
Copy link
Contributor

Summary

For Realtime streams, there can be scenarios where a very small percentage of records will be incorrectly assigned to a Stream Partition (when partitioning by a key is enabled).

In such a case, the Physical Optimizer will assume the Table Scan to be un-partitioned.

This is an overkill for some use-cases, where minute errors (e.g. <0.0001%) are acceptable.

This PR adds a query option to enable this feature. The behavior is that we will infer the partition of the segment based on its name. This only works with Realtime segments since for offline segments batch ingestion should be easily able to guarantee that the partitioning is done correctly. (we also don't have a standard way to infer a partition in that case)

I have also added some metrics which were missing with the Physical Optimizer (join count, window count, etc.). These were emitted in RelToPlanNodeConverter. I now instead do it in a centralized place so we can also emit more metrics in the future.

Testing

Added UTs for Leaf Stage worker assignment. We are also testing this out in one of our bigger clusters.

@ankitsultana ankitsultana requested a review from itschrispeck May 9, 2025 22:01
@ankitsultana ankitsultana added multi-stage Related to the multi-stage query engine mse-physical-optimizer labels May 9, 2025
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2025

Codecov Report

Attention: Patch coverage is 84.28571% with 11 lines in your changes missing coverage. Please review.

Project coverage is 63.25%. Comparing base (1a476de) to head (bd76e66).
Report is 61 commits behind head on master.

Files with missing lines Patch % Lines
...ery/planner/physical/v2/PRelNodeTreeValidator.java 77.27% 3 Missing and 2 partials ⚠️
...al/v2/opt/rules/LeafStageWorkerAssignmentRule.java 89.74% 0 Missing and 4 partials ⚠️
...he/pinot/query/context/PhysicalPlannerContext.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15760      +/-   ##
============================================
+ Coverage     62.90%   63.25%   +0.35%     
- Complexity     1386     1388       +2     
============================================
  Files          2867     2888      +21     
  Lines        163354   165241    +1887     
  Branches      24952    25259     +307     
============================================
+ Hits         102755   104524    +1769     
+ Misses        52847    52826      -21     
- Partials       7752     7891     +139     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.23% <84.28%> (+0.35%) ⬆️
java-21 63.22% <84.28%> (+0.40%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.25% <84.28%> (+0.35%) ⬆️
unittests 63.25% <84.28%> (+0.35%) ⬆️
unittests1 56.27% <84.28%> (+0.45%) ⬆️
unittests2 33.50% <1.42%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wirybeaver
Copy link
Contributor

lgtm

Copy link
Contributor

@MeihanLi MeihanLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@itschrispeck itschrispeck merged commit 8b82a41 into apache:master May 10, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants