integration-cli: add more debugging for TestSwarmClusterRotateUnlockKey by thaJeztah · Pull Request #39885 · moby/moby

thaJeztah · 2019-09-09T20:21:04Z

Relates to #38885 Flaky test: DockerSwarmSuite.TestSwarmClusterRotateUnlockKey

This test was updated in b79adac (#39616), but is still flaky #39883 (comment);

20:24:13  FAIL: docker_cli_swarm_test.go:1333: DockerSwarmSuite.TestSwarmClusterRotateUnlockKey
20:24:13
20:24:13  Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey
20:24:13  [d6f95e679cb65] waiting for daemon to start
20:24:13  [d6f95e679cb65] waiting for daemon to start
20:24:13  [d6f95e679cb65] daemon started
20:24:13
20:24:13  Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey
20:24:13  [d204a02ba4780] waiting for daemon to start
20:24:13  [d204a02ba4780] waiting for daemon to start
20:24:13  [d204a02ba4780] daemon started
20:24:13
20:24:13  [d204a02ba4780] joining swarm manager [d6f95e679cb65]@0.0.0.0:2477, swarm listen addr 0.0.0.0:2478
20:24:13  Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey
20:24:13  [d873d6a842829] waiting for daemon to start
20:24:13  [d873d6a842829] waiting for daemon to start
20:24:13  [d873d6a842829] daemon started
20:24:13
20:24:13  [d873d6a842829] joining swarm manager [d6f95e679cb65]@0.0.0.0:2477, swarm listen addr 0.0.0.0:2479
20:24:13  [d204a02ba4780] Stopping daemon
20:24:13  [d204a02ba4780] exiting daemon
20:24:13  [d204a02ba4780] Daemon stopped
20:24:13  [d204a02ba4780] waiting for daemon to start
20:24:13  [d204a02ba4780] waiting for daemon to start
20:24:13  [d204a02ba4780] daemon started
20:24:13
20:24:13  [d873d6a842829] Stopping daemon
20:24:13  [d873d6a842829] exiting daemon
20:24:13  [d873d6a842829] Daemon stopped
20:24:13  [d873d6a842829] waiting for daemon to start
20:24:13  [d873d6a842829] waiting for daemon to start
20:24:13  [d873d6a842829] daemon started
20:24:13
20:24:13  docker_cli_swarm_test.go:1413:
20:24:13      c.Assert(err, checker.IsNil, check.Commentf("%s", outs))
20:24:13  ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc000934240), Stderr:[]uint8(nil)} ("exit status 1")
20:24:13  ... Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
20:24:13
20:24:13
20:24:13  [d6f95e679cb65] Stopping daemon
20:24:13  [d6f95e679cb65] exiting daemon
20:24:13  [d6f95e679cb65] Daemon stopped
20:24:13  [d204a02ba4780] Stopping daemon
20:24:13  [d204a02ba4780] exiting daemon
20:24:13  [d204a02ba4780] Daemon stopped
20:24:13  [d873d6a842829] Stopping daemon
20:24:13  [d873d6a842829] exiting daemon
20:24:13  [d873d6a842829] Daemon stopped

The interesting bit there is that the retry loop should have a 3 second sleep before retrying,
but looking at the failure above, the test started (and failed) within a second, which means that
a different error / output was returned.

This patch adds some additional debugging to that test to see if we can catch the reason
this test is still flaky.

thaJeztah · 2019-09-09T20:23:32Z

Logs from that failing test;

d6f95e679cb65.log
d204a02ba4780.log
d873d6a842829.log

thaJeztah · 2019-09-09T21:43:31Z

Failure on RS1 is #39857

https://ci.docker.com/public/blue/rest/organizations/jenkins/pipelines/moby/branches/PR-39885/runs/1/nodes/200/log/?start=0

[2019-09-09T20:37:36.900Z] ok  	github.com/docker/docker/daemon/logger	0.525s	coverage: 43.0% of statements
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:35Z" level=info msg="Trying to get region from EC2 Metadata"
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:36Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName= logStreamName= message= origError="<nil>"
[2019-09-09T20:37:36.900Z] --- FAIL: TestLogBlocking (0.02s)
[2019-09-09T20:37:36.900Z]     cloudwatchlogs_test.go:313: Expected to be able to read from stream.messages but was unable to
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:36Z" level=error msg=Error
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:36Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=groupName logStreamName=streamName message="use token token" origError="<nil>"
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:36Z" level=error msg="Failed to put log events" errorCode=DataAlreadyAcceptedException logGroupName=groupName logStreamName=streamName message="use token token" origError="<nil>"
[2019-09-09T20:37:36.900Z] time="2019-09-09T20:37:36Z" level=info msg="Data already accepted, ignoring error" errorCode=DataAlreadyAcceptedException logGroupName=groupName logStreamName=streamName message="use token token"
[2019-09-09T20:37:36.900Z] FAIL
[2019-09-09T20:37:36.900Z] coverage: 78.2% of statements
[2019-09-09T20:37:36.900Z] FAIL	github.com/docker/docker/daemon/logger/awslogs	0.630s

thaJeztah · 2019-09-13T22:50:44Z

rebased; @cpuguy83 @tiborvass ptal

This test was updated in b79adac, but is still flaky; ``` 20:24:13 FAIL: docker_cli_swarm_test.go:1333: DockerSwarmSuite.TestSwarmClusterRotateUnlockKey 20:24:13 20:24:13 Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey 20:24:13 [d6f95e679cb65] waiting for daemon to start 20:24:13 [d6f95e679cb65] waiting for daemon to start 20:24:13 [d6f95e679cb65] daemon started 20:24:13 20:24:13 Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey 20:24:13 [d204a02ba4780] waiting for daemon to start 20:24:13 [d204a02ba4780] waiting for daemon to start 20:24:13 [d204a02ba4780] daemon started 20:24:13 20:24:13 [d204a02ba4780] joining swarm manager [d6f95e679cb65]@0.0.0.0:2477, swarm listen addr 0.0.0.0:2478 20:24:13 Creating a new daemon at: /go/src/github.com/docker/docker/bundles/test-integration/3/DockerSwarmSuite.TestSwarmClusterRotateUnlockKey 20:24:13 [d873d6a842829] waiting for daemon to start 20:24:13 [d873d6a842829] waiting for daemon to start 20:24:13 [d873d6a842829] daemon started 20:24:13 20:24:13 [d873d6a842829] joining swarm manager [d6f95e679cb65]@0.0.0.0:2477, swarm listen addr 0.0.0.0:2479 20:24:13 [d204a02ba4780] Stopping daemon 20:24:13 [d204a02ba4780] exiting daemon 20:24:13 [d204a02ba4780] Daemon stopped 20:24:13 [d204a02ba4780] waiting for daemon to start 20:24:13 [d204a02ba4780] waiting for daemon to start 20:24:13 [d204a02ba4780] daemon started 20:24:13 20:24:13 [d873d6a842829] Stopping daemon 20:24:13 [d873d6a842829] exiting daemon 20:24:13 [d873d6a842829] Daemon stopped 20:24:13 [d873d6a842829] waiting for daemon to start 20:24:13 [d873d6a842829] waiting for daemon to start 20:24:13 [d873d6a842829] daemon started 20:24:13 20:24:13 docker_cli_swarm_test.go:1413: 20:24:13 c.Assert(err, checker.IsNil, check.Commentf("%s", outs)) 20:24:13 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc000934240), Stderr:[]uint8(nil)} ("exit status 1") 20:24:13 ... Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online. 20:24:13 20:24:13 20:24:13 [d6f95e679cb65] Stopping daemon 20:24:13 [d6f95e679cb65] exiting daemon 20:24:13 [d6f95e679cb65] Daemon stopped 20:24:13 [d204a02ba4780] Stopping daemon 20:24:13 [d204a02ba4780] exiting daemon 20:24:13 [d204a02ba4780] Daemon stopped 20:24:13 [d873d6a842829] Stopping daemon 20:24:13 [d873d6a842829] exiting daemon 20:24:13 [d873d6a842829] Daemon stopped ``` The interesting bit there is that the retry loop should have a 3 second sleep before retrying, but looking at the failure above, the test started (and failed) within a second, which means that a different error / output was returned. This patch adds some additional debugging to that test to see if we can catch the reason this test is still flaky. Signed-off-by: Sebastiaan van Stijn <[email protected]>

thaJeztah · 2019-09-19T13:02:07Z

rebased

thaJeztah added status/2-code-review area/testing area/swarm labels Sep 9, 2019

GordonTheTurtle added the status/0-triage label Sep 9, 2019

thaJeztah mentioned this pull request Sep 9, 2019

Flaky test: DockerSwarmSuite.TestSwarmClusterRotateUnlockKey #38885

Closed

thaJeztah removed the status/0-triage label Sep 9, 2019

thaJeztah mentioned this pull request Sep 9, 2019

pkg/parsers/kernel: gofmt hex value (preparation for Go 1.13+) #39883

Merged

thaJeztah force-pushed the debug_flaky_TestSwarmClusterRotateUnlockKey branch from bbef0e8 to d3ab0ee Compare September 13, 2019 22:49

tiborvass approved these changes Sep 13, 2019

View reviewed changes

thaJeztah force-pushed the debug_flaky_TestSwarmClusterRotateUnlockKey branch from d3ab0ee to 78d137d Compare September 19, 2019 13:01

tiborvass approved these changes Sep 19, 2019

View reviewed changes

tiborvass merged commit 3cfb680 into moby:master Sep 19, 2019

thaJeztah deleted the debug_flaky_TestSwarmClusterRotateUnlockKey branch September 19, 2019 20:05

thaJeztah added the process/cherry-pick label Sep 25, 2019

thaJeztah mentioned this pull request Sep 25, 2019

[19.03 backport] Testing and Jenkinsfile changes [step 1] docker-archive/engine#382

Merged

vikramhh mentioned this pull request Nov 18, 2019

Bump hcsshim to b3f49c06ffaeef24d09c6c08ec8ec8425a0303e2 #40128

Closed

1 task

thaJeztah added this to the 20.03.0 milestone Apr 2, 2020

thaJeztah removed the process/cherry-pick label Feb 18, 2022

thaJeztah mentioned this pull request Jan 2, 2024

De-flake TestSwarmClusterRotateUnlockKey... again... maybe? #47009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration-cli: add more debugging for TestSwarmClusterRotateUnlockKey#39885

integration-cli: add more debugging for TestSwarmClusterRotateUnlockKey#39885
tiborvass merged 1 commit intomoby:masterfrom
thaJeztah:debug_flaky_TestSwarmClusterRotateUnlockKey

thaJeztah commented Sep 9, 2019 •

edited

Loading

Uh oh!

thaJeztah commented Sep 9, 2019

Uh oh!

thaJeztah commented Sep 9, 2019 •

edited

Loading

Uh oh!

thaJeztah commented Sep 13, 2019

Uh oh!

thaJeztah commented Sep 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thaJeztah commented Sep 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented Sep 9, 2019

Uh oh!

thaJeztah commented Sep 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented Sep 13, 2019

Uh oh!

thaJeztah commented Sep 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thaJeztah commented Sep 9, 2019 •

edited

Loading

thaJeztah commented Sep 9, 2019 •

edited

Loading