Fix race in `TestApiSwarmRestartCluster` by cpuguy83 · Pull Request #25077 · moby/moby

cpuguy83 · 2016-07-26T15:49:11Z

In TestApiSwarmRestartCluster, it's calling checkClusterHealth.
checkClusterHealth calls d.info(), which will return an error if
there is no cluster leader... problem is checkClusterHealth is doing a
nil error assertion w/o giving any time for a leader to be elected.

This moves the d.info() call into a waitAndAssert using the default
reconciliation timeout.

Fixes #24590

cpuguy83 · 2016-07-26T17:56:07Z

Found a couple of more issues with this test, working on an update.

In `TestApiSwarmRestartCluster`, it's calling `checkClusterHealth`. `checkClusterHealth` calls `d.info()`, which will return an error if there is no cluster leader... problem is `checkClusterHealth` is doing a nil error assertion w/o giving any time for a leader to be elected. This moves the `d.info()` call into a `waitAndAssert` using the default reconciliation timeout. It also moves some other checks into a `waitAndAssert` to give the cluster enough time to come back up. Signed-off-by: Brian Goff <[email protected]>

dnephin · 2016-07-27T14:26:13Z

+					return true, nil
+				}
+				nn := d.getNode(c, n.ID)
+				n = *nn


I'm not sure I understand this. Why is state being checked before getting the node?

because we already have the node, might as well check if its already ok before getting the node again,

dnephin · 2016-07-27T14:39:44Z

LGTM

crosbymichael · 2016-07-28T17:15:29Z

LGTM

tonistiigi · 2016-07-31T19:07:40Z

Starting a test daemon blocks until the control API becomes available, this should not happen before leader has been elected. If there is a reelection(not sure why it should be in this case) but there is no error the manager calls should block also. Debugging why reelection happened in this test already pointed us to at least one swarmkit bug in moby/swarmkit#1183 .

If we don't do this every user of swarm API endpoints needs to write:

for {
  if err := apicall(); err != nil {
    if isNoLeader(err) {
      backoffDelay()
      continue
    }
    return err
  }
}

That doesn't seem to be very user-friendly.

I'm fine with patching the test for CI but we need to make sure we are not hiding actual bugs by working around them in our tests.

GordonTheTurtle added the status/0-triage label Jul 26, 2016

cpuguy83 mentioned this pull request Jul 26, 2016

Flaky test: DockerSwarmSuite.TestApiSwarmRestartCluster #24590

Closed

cpuguy83 added status/2-code-review and removed status/0-triage labels Jul 26, 2016

cpuguy83 force-pushed the fix_TestApiSwarmRestartCluster branch from f0203c6 to fdcde8b Compare July 26, 2016 18:38

cpuguy83 mentioned this pull request Jul 27, 2016

[1.12.0-rc] Swarm tests indicate potential races #25101

Closed

dnephin reviewed Jul 27, 2016
View reviewed changes

crosbymichael merged commit 2620635 into moby:master Jul 28, 2016

icecrime mentioned this pull request Jul 28, 2016

Check if the container is running if no event #25178

Merged

tiborvass mentioned this pull request Jul 28, 2016

Vendoring libnetwork to 1.12.0-bump branch to avoid a deadlock #25191

Merged

cpuguy83 deleted the fix_TestApiSwarmRestartCluster branch August 1, 2016 19:54

thaJeztah mentioned this pull request Jun 27, 2019

[epic] flaky tests #37306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race in `TestApiSwarmRestartCluster`#25077

Fix race in `TestApiSwarmRestartCluster`#25077
crosbymichael merged 1 commit intomoby:masterfrom
cpuguy83:fix_TestApiSwarmRestartCluster

cpuguy83 commented Jul 26, 2016 •

edited by dnephin

Loading

Uh oh!

cpuguy83 commented Jul 26, 2016

Uh oh!

dnephin Jul 27, 2016

Uh oh!

cpuguy83 Jul 27, 2016

Uh oh!

dnephin commented Jul 27, 2016

Uh oh!

crosbymichael commented Jul 28, 2016

Uh oh!

tonistiigi commented Jul 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cpuguy83 commented Jul 26, 2016 • edited by dnephin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpuguy83 commented Jul 26, 2016

Uh oh!

dnephin Jul 27, 2016

Choose a reason for hiding this comment

Uh oh!

cpuguy83 Jul 27, 2016

Choose a reason for hiding this comment

Uh oh!

dnephin commented Jul 27, 2016

Uh oh!

crosbymichael commented Jul 28, 2016

Uh oh!

tonistiigi commented Jul 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cpuguy83 commented Jul 26, 2016 •

edited by dnephin

Loading