-
Notifications
You must be signed in to change notification settings - Fork 1.4k
RebalanceStatus API changes #10359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RebalanceStatus API changes #10359
Conversation
569fd2b to
d516fef
Compare
Codecov Report
@@ Coverage Diff @@
## master #10359 +/- ##
=============================================
- Coverage 70.30% 13.93% -56.37%
+ Complexity 6026 259 -5767
=============================================
Files 2049 2007 -42
Lines 111060 109281 -1779
Branches 16894 16692 -202
=============================================
- Hits 78078 15230 -62848
- Misses 27518 92814 +65296
+ Partials 5464 1237 -4227
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1613 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
a40b207 to
9b233f6
Compare
|
@navina , @Jackie-Jiang - Could you take a look at the overall approach and provide feedback ? |
9b233f6 to
a287e49
Compare
2b5d0b7 to
a84b407
Compare
|
It is great to list 3 different convergences. We might want to track different stats for each of them though. |
Thanks @Jackie-Jiang for taking a look. Here's the summary of the discussion (review feedback)
|
35c1901 to
8110452
Compare
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/rebalance/DefaultTableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/rebalance/ZkBasedTableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
8110452 to
76325dd
Compare
|
Side question, do we allow multiple rebalance requests from user in parallel? Wondering if it is better to not allow a rebalance request while one is in progress? |
Yes, today there's nothing preventing multiple rebalances to happen in parallel on the same table. This does not impact correctness as eventually the rebalance converges. I think it'll be good to find out what consequences (both on the user and system sides) this will have. |
244a0f6 to
aefe058
Compare
pinot-common/src/main/java/org/apache/pinot/common/metadata/ZKMetadataProvider.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/controller/api/resources/ServerRebalanceJobStatusResponse.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java
Outdated
Show resolved
Hide resolved
87760a7 to
c5f1d77
Compare
...ller/src/main/java/org/apache/pinot/controller/api/resources/PinotRealtimeTableResource.java
Outdated
Show resolved
Hide resolved
...ller/src/main/java/org/apache/pinot/controller/api/resources/PinotRealtimeTableResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceResult.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/rebalance/DefaultTableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/RebalanceConfigConstants.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceResult.java
Outdated
Show resolved
Hide resolved
27a3a6b to
beb951a
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. Good job extracting the observer
pinot-spi/src/main/java/org/apache/pinot/spi/utils/RebalanceConfigConstants.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceProgressStats.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceProgressStats.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceProgressStats.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceProgressStats.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalanceProgressStats.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/rebalance/ZkBasedTableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/rebalance/ZkBasedTableRebalanceObserver.java
Outdated
Show resolved
Hide resolved
e65aac6 to
34fc482
Compare
34fc482 to
16dde72
Compare
Problem: Unable to track status of table rebalance after it is initiated which leads to
Field reported Issues - 1, 2
Solution: Provide stats that helps users track rebalance progress -
Stats are shown in the test section.
API for stats (rebalanceStatus/jobId)
A rebalance operation will return a rebalanceId which will be used to look at rebalanceStatus.
http://localhost:9000/rebalanceStatus/f83093b1-6b1e-47f3-8721-7984d442815d
Code changes:
1) At the start of rebalance when we have initial and target states. This will tell us amount of work that rebalance starts
with.
2) Rebalance phase that waits for external view to converge to ideal state. This will tell us the pending work for
rebalance.
3) Phase if/when ideal state changes due to other events and a new target is computed. This will tell us the work
involved when ideal state changes multiple times during the rebalance.
Testing
{
"tableRebalanceProgressStats": {
"status": "DONE",
"completionStatusMsg": "Finished rebalancing table: billing_OFFLINE with minAvailableReplicas: 1, enableStrictReplicaGroup: false, bestEfforts: false in 32 ms.",
"initialToTargetStateConvergence": {
"_segmentsMissing": 0,
"_segmentsToRebalance": 1,
"_percentSegmentsToRebalance": 100,
"_replicasToRebalance": 9
},
"externalViewToIdealStateConvergence": {
"_segmentsMissing": 0,
"_segmentsToRebalance": 0,
"_percentSegmentsToRebalance": 0,
"_replicasToRebalance": 0
},
"currentToTargetConvergence": {
"_segmentsMissing": 0,
"_segmentsToRebalance": 0,
"_percentSegmentsToRebalance": 0,
"_replicasToRebalance": 0
},
"startTimeInMilliseconds": 1678214633564,
"timeToFinishInSeconds": 0,
},
"timeElapsedSinceStartInSeconds": 9
}
Output of http://localhost:9000/table/airlineStats_OFFLINE/jobs?type=OFFLINE
{
"178b0d79-92d8-43c7-9aa0-0448da01fa94": {
"jobId": "178b0d79-92d8-43c7-9aa0-0448da01fa94",
"messageCount": "10",
"submissionTimeMs": "1678302700785",
"jobType": "RELOAD_ALL_SEGMENTS",
"tableName": "airlineStats_OFFLINE"
},
"095855b6-5b9b-48d5-ad97-c7a2db2d1b7b": {
"jobId": "095855b6-5b9b-48d5-ad97-c7a2db2d1b7b",
"submissionTimeMs": "1678302711030",
"REBALANCE_PROGRESS_STATS": "{"status":"DONE","timeToFinishInSeconds":0,"completionStatusMsg":"Finished rebalancing table: airlineStats_OFFLINE with minAvailableReplicas: 1, enableStrictReplicaGroup: false, bestEfforts: false in 41 ms.","startTimeInMilliseconds":1.678302711006E12,"initialToTargetStateConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":31,"_percentSegmentsToRebalance":100.0,"_replicasToRebalance":279},"externalViewToIdealStateConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":0,"_percentSegmentsToRebalance":0.0,"_replicasToRebalance":0},"currentToTargetConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":0,"_percentSegmentsToRebalance":0.0,"_replicasToRebalance":0}}",
"tableName": "airlineStats_OFFLINE"
}
}