Fix expectations for manual failover tests#2453
Merged
enjoy-binbin merged 2 commits intovalkey-io:unstablefrom Aug 8, 2025
Merged
Fix expectations for manual failover tests#2453enjoy-binbin merged 2 commits intovalkey-io:unstablefrom
enjoy-binbin merged 2 commits intovalkey-io:unstablefrom
Conversation
51adff5 to
cf42ba8
Compare
sarthakaggarwal97
approved these changes
Aug 7, 2025
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2453 +/- ##
============================================
+ Coverage 71.51% 71.56% +0.04%
============================================
Files 125 125
Lines 69214 69214
============================================
+ Hits 49499 49530 +31
+ Misses 19715 19684 -31 🚀 New features to boost your workflow:
|
cf42ba8 to
5dd1833
Compare
Signed-off-by: Tyler Amano-Smerling <[email protected]>
5dd1833 to
e2f7960
Compare
enjoy-binbin
approved these changes
Aug 8, 2025
Member
enjoy-binbin
left a comment
There was a problem hiding this comment.
This seem corret, did you encounter this failure somewhere? Or did you just doing a code review and find this?
Contributor
Author
|
@enjoy-binbin I caught this while debugging the test under valgrind. I think valgrind slows the tests down enough to the point where the primary would be back up when the |
allenss-amazon
pushed a commit
to allenss-amazon/valkey-core
that referenced
this pull request
Aug 19, 2025
Test `Instance valkey-io#5 is still a slave after some time (no failover)` is supposed to verify that command `CLUSTER FAILOVER` will not promote a replica without quorum from the primary; later in the file (`Instance 5 is a master after some time`), we verify that `CLUSTER FAILOVER FORCE` does promote a replica under the same conditions. There's a couple issues with the tests: 1. `Instance valkey-io#5 is still a slave after some time (no failover)` should verify that instance 5 is a replica (i.e. that there's no failover), but we call `assert {[s -5 role] eq {master}}`. 2. The reason why the above assert works is that we previously send `DEBUG SLEEP 10` to the primary, which pauses the primary for longer than the configured 3 seconds for`cluster-node-timeout`. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary. This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test `Instance valkey-io#5 is a master after some time` is updated to sleep for a shorter duration to ensure that `FAILOVER FORCE` succeeds under the exact same conditions. ### Testing `./runtest --single unit/cluster/manual-failover --loop --fastfail` Signed-off-by: Tyler Amano-Smerling <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test
Instance #5 is still a slave after some time (no failover)is supposed to verify that commandCLUSTER FAILOVERwill not promote a replica without quorum from the primary; later in the file (Instance 5 is a master after some time), we verify thatCLUSTER FAILOVER FORCEdoes promote a replica under the same conditions.There's a couple issues with the tests:
Instance #5 is still a slave after some time (no failover)should verify that instance 5 is a replica (i.e. that there's no failover), but we callassert {[s -5 role] eq {master}}.DEBUG SLEEP 10to the primary, which pauses the primary for longer than the configured 3 seconds forcluster-node-timeout. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary.This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test
Instance #5 is a master after some timeis updated to sleep for a shorter duration to ensure thatFAILOVER FORCEsucceeds under the exact same conditions.Testing
./runtest --single unit/cluster/manual-failover --loop --fastfail