-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Add Support for inferRealtimeSegmentPartition #16023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16023 +/- ##
============================================
+ Coverage 62.90% 63.38% +0.48%
+ Complexity 1386 1357 -29
============================================
Files 2867 2912 +45
Lines 163354 166995 +3641
Branches 24952 25544 +592
============================================
+ Hits 102755 105848 +3093
- Misses 52847 53126 +279
- Partials 7752 8021 +269
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
itschrispeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, I think there should be no concerns deleting the other query option/logic since it was added after the latest release
wirybeaver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double confirmation, this query option means the user prefer low query latency over the correctness of result.
|
@wirybeaver yup. But another use for this is long retention realtime tables where adding segment partition config will take a while to update zk metadata. |
|
Please add some documentation about how to use this new option |
Summary
In #15760 we had added support for inferring invalid segment partition id. That didn't work completely for us because we one of our colocated join use-cases also leverages upsert compaction, and we are seeing a behavior where after compaction and segment only has invalid segment partition records, which changes that segment's partition. Since segment assignment is not done again as part of compaction, this leads to segments for a given partition to span multiple servers.
In this PR I have marked the old query option as Deprecated and added a new one which is broader and is capable of handling this edge case too.
Test Plan
Added quite a bit of unit-tests. We also have existing unit-tests which do sanity testing. Testing this particular scenario E2E is hard but we are planning to deploy this patch to our cluster within a few days itself.