Fix expectations for manual failover tests by amanosme · Pull Request #2453 · valkey-io/valkey

amanosme · 2025-08-07T21:48:02Z

Test Instance #5 is still a slave after some time (no failover) is supposed to verify that command CLUSTER FAILOVER will not promote a replica without quorum from the primary; later in the file (Instance 5 is a master after some time), we verify that CLUSTER FAILOVER FORCE does promote a replica under the same conditions.

There's a couple issues with the tests:

Instance #5 is still a slave after some time (no failover) should verify that instance 5 is a replica (i.e. that there's no failover), but we call assert {[s -5 role] eq {master}}.
The reason why the above assert works is that we previously send DEBUG SLEEP 10 to the primary, which pauses the primary for longer than the configured 3 seconds for cluster-node-timeout. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary.

This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test Instance #5 is a master after some time is updated to sleep for a shorter duration to ensure that FAILOVER FORCE succeeds under the exact same conditions.

Testing

./runtest --single unit/cluster/manual-failover --loop --fastfail

codecov · 2025-08-07T22:05:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.56%. Comparing base (7bbf523) to head (f34af45).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2453      +/-   ##
============================================
+ Coverage     71.51%   71.56%   +0.04%     
============================================
  Files           125      125              
  Lines         69214    69214              
============================================
+ Hits          49499    49530      +31     
+ Misses        19715    19684      -31

see 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Tyler Amano-Smerling <[email protected]>

enjoy-binbin

This seem corret, did you encounter this failure somewhere? Or did you just doing a code review and find this?

amanosme · 2025-08-08T08:07:03Z

@enjoy-binbin I caught this while debugging the test under valgrind. I think valgrind slows the tests down enough to the point where the primary would be back up when the CLUSTER FAILOVER command was issued (and the check for master role failed)

Test `Instance valkey-io#5 is still a slave after some time (no failover)` is supposed to verify that command `CLUSTER FAILOVER` will not promote a replica without quorum from the primary; later in the file (`Instance 5 is a master after some time`), we verify that `CLUSTER FAILOVER FORCE` does promote a replica under the same conditions. There's a couple issues with the tests: 1. `Instance valkey-io#5 is still a slave after some time (no failover)` should verify that instance 5 is a replica (i.e. that there's no failover), but we call `assert {[s -5 role] eq {master}}`. 2. The reason why the above assert works is that we previously send `DEBUG SLEEP 10` to the primary, which pauses the primary for longer than the configured 3 seconds for`cluster-node-timeout`. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary. This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test `Instance valkey-io#5 is a master after some time` is updated to sleep for a shorter duration to ensure that `FAILOVER FORCE` succeeds under the exact same conditions. ### Testing `./runtest --single unit/cluster/manual-failover --loop --fastfail` Signed-off-by: Tyler Amano-Smerling <[email protected]>

amanosme force-pushed the amanosme-unstable branch from 51adff5 to cf42ba8 Compare August 7, 2025 21:49

sarthakaggarwal97 approved these changes Aug 7, 2025

View reviewed changes

amanosme force-pushed the amanosme-unstable branch from cf42ba8 to 5dd1833 Compare August 7, 2025 22:55

Fix expectations for manual failover tests

e2f7960

Signed-off-by: Tyler Amano-Smerling <[email protected]>

amanosme force-pushed the amanosme-unstable branch from 5dd1833 to e2f7960 Compare August 7, 2025 23:21

amanosme marked this pull request as draft August 7, 2025 23:22

amanosme marked this pull request as ready for review August 7, 2025 23:39

enjoy-binbin approved these changes Aug 8, 2025

View reviewed changes

Merge branch 'valkey-io:unstable' into amanosme-unstable

f34af45

enjoy-binbin merged commit de7bb61 into valkey-io:unstable Aug 8, 2025
52 checks passed

amanosme deleted the amanosme-unstable branch August 8, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix expectations for manual failover tests#2453

Fix expectations for manual failover tests#2453
enjoy-binbin merged 2 commits intovalkey-io:unstablefrom
amanosme:amanosme-unstable

amanosme commented Aug 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 7, 2025 •

edited

Loading

Uh oh!

enjoy-binbin left a comment

Uh oh!

amanosme commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amanosme commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

codecov bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

enjoy-binbin left a comment

Choose a reason for hiding this comment

Uh oh!

amanosme commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amanosme commented Aug 7, 2025 •

edited

Loading

codecov bot commented Aug 7, 2025 •

edited

Loading