Skip to content

Conversation

@tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Jul 15, 2019

relates to / addresses:

SwarmKit bump:

full diff: moby/swarmkit@fb584e7...7dded76

includes:

@thaJeztah
Copy link
Member

DockerSwarmSuite.TestAPISwarmRaftQuorum failed on Janky and Experimental

DockerSwarmSuite.TestAPISwarmServicesCreate also failed on Experimental

@codecov
Copy link

codecov bot commented Jul 16, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@42a0544). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #39531   +/-   ##
=========================================
  Coverage          ?   37.31%           
=========================================
  Files             ?      609           
  Lines             ?    45234           
  Branches          ?        0           
=========================================
  Hits              ?    16881           
  Misses            ?    26062           
  Partials          ?     2291

@tonistiigi
Copy link
Member Author

Pushed updates for: TestAPISwarmLeaderElection and TestAPISwarmRaftQuorum. TestAPISwarmLeaderElection now allows temporary context deadline exceeded and no-leader errors while the leader is switching. Ideally, these should never happen and we can create an issue to track if it is possible to guarantee that. The test still checks that leader election eventually happens and cluster state normalizes. For TestAPISwarmRaftQuorum I increased the timeout from default 30s like some other tests do as well. Probably for similar reasons as TestAPISwarmLeaderElection it sometimes gets deadline exceeded errors and when this happens it should have been already waited for 20sec, plus possible wait for heartbeats. If this doesn't help then temporarily I wouldn't mind allowing deadline exceeded error as well in addition to the specific error. More important part of the test is to check that the state normalizes.

For TestAPISwarmServicesCreate this seems actual swarm error with no workaround in the test. In the logs I see.

time="2019-07-15T22:22:17.270689016Z" level=error msg="failed transaction: update task desired state to REMOVE" error="update out of sequence" module=node node.id=puuje6xefqjhl5k42bfmqgh9v
time="2019-07-15T22:22:17.270715610Z" level=error msg="failed transaction: update task desired state to REMOVE" error="update out of sequence" module=node node.id=puuje6xefqjhl5k42bfmqgh9v

when it tries to remove a service.

@tonistiigi
Copy link
Member Author

In experimental TestServicePlugin now failed from integration suite. The cause seems same as TestAPISwarmServicesCreate before.

time="2019-07-16T03:20:19.393117770Z" level=error msg="failed transaction: update task desired state to REMOVE" error="update out of sequence" module=node node.id=740tzgyr852ufnycchjw39kbp
time="2019-07-16T03:20:19.393133366Z" level=error msg="failed transaction: update task desired state to REMOVE" error="update out of sequence" module=node node.id=740tzgyr852ufnycchjw39kbp
time="2019-07-16T03:20:19.393148882Z" level=error msg="failed transaction: update task desired state to REMOVE" error="update out of sequence" module=node node.id=740tzgyr852ufnycchjw39kbp

If I understand correctly this is a completely internal error in swarm that causes the state change to be ignored with no way to control or get an error from the client.

All of these are covered in https://gist.github.com/tonistiigi/849f6551003fff2da43cbe0e21172e23

@dperny
Copy link
Contributor

dperny commented Jul 16, 2019

Opened a PR to fix the Update out of sequence error: moby/swarmkit#2870

@tonistiigi
Copy link
Member Author

Reverted memberlist change to check if it has an effect on tests or can be worked on separately.

@tonistiigi tonistiigi changed the title integration-cli: check memberlist bindport in ci integration-cli: fix swarm tests flakiness Jul 17, 2019
@tonistiigi tonistiigi marked this pull request as ready for review July 17, 2019 20:26
Copy link
Contributor

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w.r.t. reducing the flakiness of tests

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crosbymichael
Copy link
Contributor

LGTM

thaJeztah added a commit to thaJeztah/docker that referenced this pull request Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby#38103 / docker-archive#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby/moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive/engine#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby/moby#38103 / docker-archive/engine#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby/moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Upstream-commit: f7dbee3eeaa1dd218116f85b8f60361acbd5b214
Component: engine
for _, d := range nodes {
if d.GetNode(c, d.NodeID()).ManagerStatus.Leader {
n := d.GetNode(c, d.NodeID(), func(err error) bool {
if strings.Contains(errors.Cause(err).Error(), context.DeadlineExceeded.Error()) || strings.Contains(err.Error(), "swarm does not have a leader") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should update this to take the same approach as #39168

@thaJeztah thaJeztah added this to the 20.03.0 milestone Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants