Fix race in TestApiSwarmRestartCluster#25077
Conversation
|
Found a couple of more issues with this test, working on an update. |
In `TestApiSwarmRestartCluster`, it's calling `checkClusterHealth`. `checkClusterHealth` calls `d.info()`, which will return an error if there is no cluster leader... problem is `checkClusterHealth` is doing a nil error assertion w/o giving any time for a leader to be elected. This moves the `d.info()` call into a `waitAndAssert` using the default reconciliation timeout. It also moves some other checks into a `waitAndAssert` to give the cluster enough time to come back up. Signed-off-by: Brian Goff <[email protected]>
f0203c6 to
fdcde8b
Compare
| return true, nil | ||
| } | ||
| nn := d.getNode(c, n.ID) | ||
| n = *nn |
There was a problem hiding this comment.
I'm not sure I understand this. Why is state being checked before getting the node?
There was a problem hiding this comment.
because we already have the node, might as well check if its already ok before getting the node again,
|
LGTM |
1 similar comment
|
LGTM |
|
Starting a test daemon blocks until the control API becomes available, this should not happen before leader has been elected. If there is a reelection(not sure why it should be in this case) but there is no error the manager calls should block also. Debugging why reelection happened in this test already pointed us to at least one swarmkit bug in moby/swarmkit#1183 . If we don't do this every user of swarm API endpoints needs to write: That doesn't seem to be very user-friendly. I'm fine with patching the test for CI but we need to make sure we are not hiding actual bugs by working around them in our tests. |
In
TestApiSwarmRestartCluster, it's callingcheckClusterHealth.checkClusterHealthcallsd.info(), which will return an error ifthere is no cluster leader... problem is
checkClusterHealthis doing anil error assertion w/o giving any time for a leader to be elected.
This moves the
d.info()call into awaitAndAssertusing the defaultreconciliation timeout.
Fixes #24590