Conversation
Since commit 17173ef checkSwarmLockedToUnlocked() no longer require its third argument, so remove it. Signed-off-by: Kir Kolyshkin <[email protected]>
1. Using MNT_FORCE flag does not make sense for nsfs. Using MNT_DETACH though might help. 2. When -check.vv is added to TESTFLAGS, there are a lot of messages like this one: > unmount of /tmp/dxr/d847fd103a4ba/netns failed: invalid argument and some like > unmount of /tmp/dxr/dd245af642d94/netns failed: no such file or directory The first one means directory is not a mount point, the second one means it's gone. Do ignore both of these. Signed-off-by: Kir Kolyshkin <[email protected]>
A timer is leaking on every daemon start and stop. Probably nothing major, but given the amount of daemon starts/stops during tests, it's better to be accurate about it. Signed-off-by: Kir Kolyshkin <[email protected]>
98e0388 to
eadb71b
Compare
|
OK the failure in
happens (occasionally) because d1 (which is a leader) losts both d2 and d3, and thus it loses quorum and steps itself down from the leader to a follower:
what exactly causes that is not yet clear to me... |
2310221 to
1b9ae46
Compare
|
Found a possible cause. Testing a fix (which makes sense for other tests, too, and will be applied if proved usable). |
8dd6f99 to
1547db6
Compare
|
Out of 5 runs, got a TestAPISwarmLeaderElection failure on z (I haven't patched this test yet). Guess it makes sense to patch all the swarm tests. |
72f830b to
7c3bf5b
Compare
|
Patched all the tests, reset the counter, running repeated CI again. Runs 1-5, no failures. Run 6, failure on experimental in |
706c6ad to
6f9b564
Compare
|
Runs 7-9: no failures. Run 10/60 [▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure in |
This is repeated 6 times in different tests, with slight minor variations. Let's factor it out, for clarity. While at it, simplify the code: instead of more complex parsing of "docker swarm init|update --autolock" output (1) and checking if the key is also present in "docker swarm unlock-key" output (2), get the key from (2) and check it is present in (1). Signed-off-by: Kir Kolyshkin <[email protected]>
.. Signed-off-by: Kir Kolyshkin <[email protected]>
3f55b07 to
8faac3e
Compare
|
Run 18/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure on power, 00:11:55.420 |
8faac3e to
46f123a
Compare
|
Run 19/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Two failures, one on power in 00:19:42.491 and a new failure on z, 00:35:47.832 |
46f123a to
af4ec3e
Compare
...... Signed-off-by: Kir Kolyshkin <[email protected]>
|
Some previous findings on |
4a5b90b to
e9d385d
Compare
|
Run 20 -- no failures Run 21/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure on power in 00:33:55.218 |
b168e90 to
5643aed
Compare
When starting docker daemons for swarm testing, we disable iptables and use lo for communication (in order to avoid network conflicts). The problem is, these options are lost on restart, that can lead to any sorts of network conflicts and thus connectivity issues between swarm nodes. Fix this. 29/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failures so far: * TestSwarmLockUnlockCluster ▓▓ * TestAPISwarmLeaderElection ▓▓▓▓ * TestSwarmClusterRotateUnlockKey ▓▓▓ * TestSwarmPublishAdd ▓ Signed-off-by: Kir Kolyshkin <[email protected]>
5643aed to
ede2d11
Compare
|
@kolyshkin giving the "status" and the time without activity on this PR, I'm gonna close it 👼 |
|
Right, thanks. The good part of it is already merged in #38127, the rest is just [futile] attempts to figure out what's going on with flaky swarm tests. |
|
Perhaps we could use |
Just playing with TestSwarmClusterRotateUnlockKey a bit