Skip to content

Conversation

@swaminathanmanish
Copy link
Contributor

@swaminathanmanish swaminathanmanish commented Mar 1, 2023

Problem: Unable to track status of table rebalance after it is initiated which leads to

  • Not knowing whether a rebalance is taking time or is indefinitely stuck.
  • Operational burden (productivity impact) to track status and debug.
    Field reported Issues - 1, 2

Solution: Provide stats that helps users track rebalance progress -

  • Difference between initial and target states (what rebalance starts with)
  • Difference between ideal state and external view (tells us pending work)
  • Difference between ideal state and recomputed target state (if rebalance takes multiple iterations when ideal state changes, during the rebalance)

Stats are shown in the test section.
API for stats (rebalanceStatus/jobId)
A rebalance operation will return a rebalanceId which will be used to look at rebalanceStatus.
http://localhost:9000/rebalanceStatus/f83093b1-6b1e-47f3-8721-7984d442815d

Code changes:

  • Introduced a uniqueId for rebalance operation (rebalanceJobId). This is a required parameter passed along in the rebalanceConfig.
  • Introduced rebalanceStatus API to get rebalance status + stats, given the rebalanceJobId
  • Introduced Observer/callback for TableRebalancer that encapsulates the logging of rebalance status + stats to ZooKeeper. rebalanceJobId is the jobId to lookup the Zookeeper state, used by rebalanceStatus API. rebalanceId is part of rebalanceResult that's returned to the user when a rebalance is initiated.
  • The triggers to the observer in TableRebalancer to collect status + stats are the following
    1) At the start of rebalance when we have initial and target states. This will tell us amount of work that rebalance starts
    with.
    2) Rebalance phase that waits for external view to converge to ideal state. This will tell us the pending work for
    rebalance.
    3) Phase if/when ideal state changes due to other events and a new target is computed. This will tell us the work
    involved when ideal state changes multiple times during the rebalance.
  • Created new znode path for rebalance jobType.
  • Enhanced /table/tableName/jobs api to retrieve rebalance jobs as well. This is to discover rebalanceJobs, so as to get the jobId to look at status.
  • Applied pinotStyle codeStyle to files which made formatting changes to unrelated areas of code as well.

Testing

  • Unit test
  • Integration test after a rebalance is initiated - initialToTargetStateConvergence tells us total work when rebalance was initiated, externalViewToIdealStateConvergence tells us pending work, currentToTargetConvergence tells us if ideal state changed multiple time since rebalance started and how far is target from ideal state.
    {
    "tableRebalanceProgressStats": {
    "status": "DONE",
    "completionStatusMsg": "Finished rebalancing table: billing_OFFLINE with minAvailableReplicas: 1, enableStrictReplicaGroup: false, bestEfforts: false in 32 ms.",
    "initialToTargetStateConvergence": {
    "_segmentsMissing": 0,
    "_segmentsToRebalance": 1,
    "_percentSegmentsToRebalance": 100,
    "_replicasToRebalance": 9
    },
    "externalViewToIdealStateConvergence": {
    "_segmentsMissing": 0,
    "_segmentsToRebalance": 0,
    "_percentSegmentsToRebalance": 0,
    "_replicasToRebalance": 0
    },
    "currentToTargetConvergence": {
    "_segmentsMissing": 0,
    "_segmentsToRebalance": 0,
    "_percentSegmentsToRebalance": 0,
    "_replicasToRebalance": 0
    },
    "startTimeInMilliseconds": 1678214633564,
    "timeToFinishInSeconds": 0,
    },
    "timeElapsedSinceStartInSeconds": 9
    }

Output of http://localhost:9000/table/airlineStats_OFFLINE/jobs?type=OFFLINE

{
"178b0d79-92d8-43c7-9aa0-0448da01fa94": {
"jobId": "178b0d79-92d8-43c7-9aa0-0448da01fa94",
"messageCount": "10",
"submissionTimeMs": "1678302700785",
"jobType": "RELOAD_ALL_SEGMENTS",
"tableName": "airlineStats_OFFLINE"
},
"095855b6-5b9b-48d5-ad97-c7a2db2d1b7b": {
"jobId": "095855b6-5b9b-48d5-ad97-c7a2db2d1b7b",
"submissionTimeMs": "1678302711030",
"REBALANCE_PROGRESS_STATS": "{"status":"DONE","timeToFinishInSeconds":0,"completionStatusMsg":"Finished rebalancing table: airlineStats_OFFLINE with minAvailableReplicas: 1, enableStrictReplicaGroup: false, bestEfforts: false in 41 ms.","startTimeInMilliseconds":1.678302711006E12,"initialToTargetStateConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":31,"_percentSegmentsToRebalance":100.0,"_replicasToRebalance":279},"externalViewToIdealStateConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":0,"_percentSegmentsToRebalance":0.0,"_replicasToRebalance":0},"currentToTargetConvergence":{"_segmentsMissing":0,"_segmentsToRebalance":0,"_percentSegmentsToRebalance":0.0,"_replicasToRebalance":0}}",
"tableName": "airlineStats_OFFLINE"
}
}

@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 4 times, most recently from 569fd2b to d516fef Compare March 2, 2023 01:42
@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2023

Codecov Report

Merging #10359 (16dde72) into master (221db82) will decrease coverage by 56.37%.
The diff coverage is 52.65%.

@@              Coverage Diff              @@
##             master   #10359       +/-   ##
=============================================
- Coverage     70.30%   13.93%   -56.37%     
+ Complexity     6026      259     -5767     
=============================================
  Files          2049     2007       -42     
  Lines        111060   109281     -1779     
  Branches      16894    16692      -202     
=============================================
- Hits          78078    15230    -62848     
- Misses        27518    92814    +65296     
+ Partials       5464     1237     -4227     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 ?
unittests2 13.93% <52.65%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache/pinot/common/metadata/ZKMetadataProvider.java 0.00% <0.00%> (-67.68%) ⬇️
...mmon/metadata/controllerjob/ControllerJobType.java 0.00% <0.00%> (-100.00%) ⬇️
...ller/api/resources/PinotRealtimeTableResource.java 0.00% <0.00%> (-45.08%) ⬇️
...ler/api/resources/PinotSegmentRestletResource.java 9.94% <0.00%> (-28.56%) ⬇️
...pi/resources/ServerRebalanceJobStatusResponse.java 0.00% <0.00%> (ø)
...roller/helix/core/relocation/SegmentRelocator.java 47.19% <0.00%> (-19.48%) ⬇️
...ntroller/helix/core/PinotHelixResourceManager.java 62.14% <15.62%> (-9.03%) ⬇️
...oller/api/resources/PinotTableRestletResource.java 50.24% <23.25%> (-5.11%) ⬇️
...ntroller/helix/core/rebalance/TableRebalancer.java 69.60% <55.76%> (-2.03%) ⬇️
.../core/rebalance/ZkBasedTableRebalanceObserver.java 70.65% <70.65%> (ø)
... and 4 more

... and 1613 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 3 times, most recently from a40b207 to 9b233f6 Compare March 2, 2023 19:21
@swaminathanmanish swaminathanmanish changed the title (WIP....DRAFT] Rebalance API changes RebalanceStatus API changes Mar 2, 2023
@swaminathanmanish
Copy link
Contributor Author

swaminathanmanish commented Mar 2, 2023

@navina , @Jackie-Jiang - Could you take a look at the overall approach and provide feedback ?

@swaminathanmanish swaminathanmanish marked this pull request as ready for review March 2, 2023 22:46
@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 5 times, most recently from 2b5d0b7 to a84b407 Compare March 3, 2023 23:57
@Jackie-Jiang
Copy link
Contributor

It is great to list 3 different convergences. We might want to track different stats for each of them though.
When tracking the convergence, we want to track both the segments not converged, and how many replicas are different (we may track total number of replicas that are different from the 2 views).
For the targetToIdealStateConvergence, we should call it currentToTargetConvergence, and the target should be the re-computed one instead of the initial calculated one in case the IS changed while rebalancing is running.
For the timeElapsed info, let's use seconds and round it to whole number which is much easier to read

@swaminathanmanish
Copy link
Contributor Author

currentToTargetConvergence

Thanks @Jackie-Jiang for taking a look. Here's the summary of the discussion (review feedback)

  1. Use callback/observer in the TableRebalancer class. The code to track the rebalance status + stats in Zookeeper will be encapsulated in the observer that'll be invoked from TableRebalancer. This also provides the flexibility of plugging in different types of observers. We will have 3 types of triggers for the Observer (Start of rebalance, external view to ideal state convergence, ideal state changes).
  2. Track the number of replicas to converge in addition to segments/partitions to converge
  3. Rename 'targetToIdealStateConvergence' to 'currentToTargetConvergence'

@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 6 times, most recently from 35c1901 to 8110452 Compare March 7, 2023 19:23
@mayankshriv
Copy link
Contributor

Side question, do we allow multiple rebalance requests from user in parallel? Wondering if it is better to not allow a rebalance request while one is in progress?

@swaminathanmanish
Copy link
Contributor Author

Side question, do we allow multiple rebalance requests from user in parallel? Wondering if it is better to not allow a rebalance request while one is in progress?

Yes, today there's nothing preventing multiple rebalances to happen in parallel on the same table. This does not impact correctness as eventually the rebalance converges. I think it'll be good to find out what consequences (both on the user and system sides) this will have.

@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 2 times, most recently from 244a0f6 to aefe058 Compare March 15, 2023 04:33
@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 11 times, most recently from 87760a7 to c5f1d77 Compare March 16, 2023 19:48
@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 4 times, most recently from 27a3a6b to beb951a Compare March 17, 2023 04:18
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Good job extracting the observer

@swaminathanmanish swaminathanmanish force-pushed the rebalanceApi branch 2 times, most recently from e65aac6 to 34fc482 Compare March 17, 2023 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants