-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add Pre-check for downtime and minAvailableReplica for Pauseless Tables #15953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| } | ||
|
|
||
| if (PauselessConsumptionUtils.isPauselessEnabled(tableConfig)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's add a TODO here to revisit if this pre-check should be removed once improvements are made to rebalance to handle this scenario better, so that we don't forget to address this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
somandal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks for the quick turnaround on this!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15953 +/- ##
============================================
+ Coverage 62.90% 63.43% +0.52%
+ Complexity 1386 1353 -33
============================================
Files 2867 2895 +28
Lines 163354 166337 +2983
Branches 24952 25439 +487
============================================
+ Hits 102755 105510 +2755
+ Misses 52847 52833 -14
- Partials 7752 7994 +242
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es (apache#15953) * add precheck for pauseless table with downtime/minAvailableReplica=0 * add test for pauseless pre-check * lint: style * add comment and todo
Description
It was revealed a risk of data loss for pauseless tables during rebalance, when
downtime=trueorminAvailableReplicas=0.If a segment is being moved and has not yet uploaded to deep store, premature deletion could cause irrecoverable data loss.
This PR introduces pre-checks and warnings as a workaround to mitigate such scenarios -- Add to the pre-check logic for table rebalancing when "pauseless ingestion" is enabled yet the rebalance parameters have downtime=true
orminAvailableReplicas=0`, adding additional safety checks and warnings to prevent potential data loss.Key Changes
Enhance Pre-Check Logic:
"rebalanceConfigOptions"if:minAvailableReplicas=0for pauseless tables.Testing:
TableRebalancerClusterStatelessTestto cover new warning scenarios and validate correct pre-check status/messages for pauseless tables.Tests
Case 1: Pauseless table with
RF=1 -> RF=1, rebalanced from 2 servers to 1 server,downtime=false,minAvailableReplica=-1Case 2: Pauseless table with
RF=1 -> RF=2, rebalanced from 1 servers to 2 server,downtime=trueCase 3: Pauseless table with
RF=1 -> RF=2, rebalanced from 1 servers to 2 server,minAvailableReplica=-2Case 4: Pauseless table with
RF=1 -> RF=2, rebalanced from 1 servers to 2 server,minAvailableReplica=-1