Skip to content

Conversation

@ankitsultana
Copy link
Contributor

Summary

In #15760 we had added support for inferring invalid segment partition id. That didn't work completely for us because we one of our colocated join use-cases also leverages upsert compaction, and we are seeing a behavior where after compaction and segment only has invalid segment partition records, which changes that segment's partition. Since segment assignment is not done again as part of compaction, this leads to segments for a given partition to span multiple servers.

In this PR I have marked the old query option as Deprecated and added a new one which is broader and is capable of handling this edge case too.

Test Plan

Added quite a bit of unit-tests. We also have existing unit-tests which do sanity testing. Testing this particular scenario E2E is hard but we are planning to deploy this patch to our cluster within a few days itself.

@ankitsultana ankitsultana added multi-stage Related to the multi-stage query engine mse-physical-optimizer labels Jun 5, 2025
@ankitsultana ankitsultana requested a review from itschrispeck June 5, 2025 21:13
@codecov-commenter
Copy link

codecov-commenter commented Jun 5, 2025

Codecov Report

Attention: Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.38%. Comparing base (1a476de) to head (13bc478).
Report is 207 commits behind head on master.

Files with missing lines Patch % Lines
...al/v2/opt/rules/LeafStageWorkerAssignmentRule.java 83.33% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16023      +/-   ##
============================================
+ Coverage     62.90%   63.38%   +0.48%     
+ Complexity     1386     1357      -29     
============================================
  Files          2867     2912      +45     
  Lines        163354   166995    +3641     
  Branches      24952    25544     +592     
============================================
+ Hits         102755   105848    +3093     
- Misses        52847    53126     +279     
- Partials       7752     8021     +269     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.36% <84.00%> (+0.49%) ⬆️
java-21 63.33% <84.00%> (+0.51%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.38% <84.00%> (+0.48%) ⬆️
unittests 63.38% <84.00%> (+0.48%) ⬆️
unittests1 56.50% <84.00%> (+0.68%) ⬆️
unittests2 33.40% <0.00%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@itschrispeck itschrispeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, I think there should be no concerns deleting the other query option/logic since it was added after the latest release

Copy link
Contributor

@wirybeaver wirybeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double confirmation, this query option means the user prefer low query latency over the correctness of result.

@itschrispeck itschrispeck merged commit 3c8b260 into apache:master Jun 6, 2025
18 checks passed
@ankitsultana
Copy link
Contributor Author

@wirybeaver yup. But another use for this is long retention realtime tables where adding segment partition config will take a while to update zk metadata.

@Jackie-Jiang
Copy link
Contributor

Please add some documentation about how to use this new option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants