Deflake replica selection test by relaxing cluster configurations#2672
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2672 +/- ##
============================================
+ Coverage 72.18% 72.62% +0.44%
============================================
Files 128 128
Lines 70994 71273 +279
============================================
+ Hits 51246 51762 +516
+ Misses 19748 19511 -237 🚀 New features to boost your workflow:
|
zuiderkwast
left a comment
There was a problem hiding this comment.
Do you think it will help or is it a wild guess?
3d1a7a3 to
0d3b28f
Compare
|
@zuiderkwast The test fails with only valgrind in the past couple of weeks, so it should be related to general slowness with valgrind. Also, I have few passing valgrind runs in my local repo after this change, so it should work! |
enjoy-binbin
left a comment
There was a problem hiding this comment.
5000 200 is quite a huge timeout and look odd to me, we have a lot of the same cluster test (i belive) under the daily, have you measured its testing time in daily ci? Do you think adjusting cluster-ping-interval and cluster-node-timeout would help?
|
Valgrind tests take about 3hrs 50mins ~ something that we see in daily tests too. Let me explore cluster-ping-interval and cluster-node-timeout. |
0d3b28f to
65d857e
Compare
|
@enjoy-binbin I think your suggestion has worked. I somehow didn't notice that the values for ping internal and node timeout are less by default. Just increasing for this test have gotten me 2-3 successful runs together. |
Signed-off-by: Sarthak Aggarwal <[email protected]>
65d857e to
2525015
Compare
enjoy-binbin
left a comment
There was a problem hiding this comment.
Just increasing for this test have gotten me 2-3 successful runs together.
thanks, please try running it a few more times before we merge it.
|
@enjoy-binbin the test is green for 6 last runs in my local repo! |
…lkey-io#2672) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Signed-off-by: Sarthak Aggarwal <[email protected]>
) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Backported to the 9.0 branch in #2731. Signed-off-by: Sarthak Aggarwal <[email protected]>
…lkey-io#2672) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node valkey-io#10 should eventually replicate node valkey-io#5 in tests/unit/cluster/slave-selection.tcl valkey-io#10 didn't became slave of valkey-io#5 Signed-off-by: Sarthak Aggarwal <[email protected]>
We have relaxed the
cluster-ping-intervalandcluster-node-timeoutso that cluster has enough time to stabilize and propagate changes.Today's failed test run: https://github.com/valkey-io/valkey/actions/runs/18179260254/job/51751751729#step:6:11262