-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Handle Excluded New Segments in MSE Physical Optimizer #15780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b6c2a9 to
1ef191a
Compare
| List<String> selectedSegments = new ArrayList<>(); | ||
| if (info != null) { | ||
| List<String> segmentsForPartition = tablePartitionInfo.getSegmentsInPartition(partitionNum); | ||
| if (!segmentsForPartition.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Though this check is not required, I think it makes the code slightly easier to read
| public TablePartitionInfo(String tableNameWithType, String partitionColumn, String partitionFunctionName, | ||
| int numPartitions, PartitionInfo[] partitionInfoMap, List<String> segmentsWithInvalidPartition) { | ||
| int numPartitions, PartitionInfo[] partitionInfoMap, List<String> segmentsWithInvalidPartition, | ||
| Map<Integer, List<String>> excludedNewSegments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mark the existing constructor with @VisibleForTesting and create a new constructor to pass in the excludedNewSegments. The modification of existing unit test of TablePartitionInfo don't need to be modified in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo having multiple ctors is more error prone in the long run. It's easy to call ctors with lower number of args and developers then don't think about everything that's required to create an accurate version of an object.
|
LGTM. My hunch is that the broker routing manager can insert the partition number into segment when the helix thread invoke the update method. It will simplify the code of workerAssignment a lot and reduce the overhead in the query path. Specifically, the core part of workerAssignment is to ensure the partition is not assigned to multiple servers. As we have |
wirybeaver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a global unit test failure on the pinot-core module
shauryachats
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you need to compute the partition info without maintaining fully replicated servers? If so, I'd suggest adding a new type of TablePartitionInfo and directly calculate the info needed for the new physical planner, e.g. all segments within the partition, segments with invalid partition etc.
This way, we don't need to pay overhead to maintain info not needed for both approaches.
I don't follow, which overhead is concerning here? Note that this change is simply tracking the excludedNewSegments in TPI. These were created temporarily anyways, but all I am doing is passing them to From a semantics point of view I think |
|
What I meant is that the new
Because they need different info, I'd suggest adding a new class similar to |
|
Isn't that an overkill? Requirements of
Maybe I am missing something, is there a deeper reason to create another POJO here? Maintaining two separate POJOs with mostly the same information will be an anti-pattern in my view. |
itschrispeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side
The main difference between them is as following:
The reason why Because the info needed for them are actually different, even though they share common properties (e.g. which partition a segment belongs to), it will be easier to maintain if keep the logic separate. With current approach, there are overhead for both use cases:
|
|
On second thought, the current changes in this PR are not semantically correct. Like Jackie mentioned, I am only interested in getting the segments corresponding to a given partition from Segment Partition Metadata Manager. The segment selection, and exclusion of new segments, is delegated to Routing Manager, as it ideally should be. Given that, me adding excludedNewSegments to the existing POJO isn't ideal since that will allow the exclusion logic to be owned by both SegmentPartitionMetadataManager and Routing Manager. I'll update this soon. |
1ef191a to
4990bbb
Compare
|
|
||
|
|
||
| /** | ||
| * Tracks segments by partition for a table. Also tracks the invalid partition segments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: moved existing TablePartitionInfo to TablePartitionReplicatedServersInfo. This one only tracks segments by partition and the invalid partition segments.
|
|
||
| private void computeTablePartitionInfo() { | ||
| @Override | ||
| public synchronized void onAssignmentChange(IdealState idealState, ExternalView externalView, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: GH is not great at showing this but I have simply moved the compute table partition info methods towards the end. however reviewers can double check this claim.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15780 +/- ##
============================================
+ Coverage 62.90% 63.33% +0.43%
+ Complexity 1386 1354 -32
============================================
Files 2867 2898 +31
Lines 163354 166410 +3056
Branches 24952 25453 +501
============================================
+ Hits 102755 105398 +2643
- Misses 52847 53042 +195
- Partials 7752 7970 +218
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| _segmentInfoMap.put(segment, segmentInfo); | ||
| } | ||
| computeTablePartitionInfo(); | ||
| computeAllTablePartitionInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding config to turn each of them on/off to reduce overhead. Can be done as a follow up
Summary
The Leaf stage worker assignment rule was iterating on
PartitionInfo._segmentslist. But that list is not complete, andSegmentPartitionMetadataManagerwas skipping the new segments fromTablePartitionInfo.This PR fixes that by:
TablePartitionInfotoTablePartitionReplicatedServersInfo(TPRSI), so it's clear that the TPRSI POJO also tracks segment replication (i.e. segment/instance assignment)TablePartitionReplicatedServersInfofromLeafStageWorkerAssignmentRuleand only relying on the newTablePartitionInfo(TPI).Big Picture
The bigger picture here is that the v2 query optimizer relies on
TablePartitionInfosolely for quickly accessing segments belonging to a given partition. Prior to this PR,TablePartitionInfoused by the leaf stage worker assignment was also automatically skipping new segments from being tracked in the TPI.This was bad design, because the v2 query optimizer delegates the responsibility of segment selection and assignment to the Routing Manager.
Test Plan
Added Unit Tests. We also have existing unit-tests and E2E plan tests that verify plan generation.