Skip to content

Integration: cleanup custom "docker0" bridge after tests#39571

Open
thaJeztah wants to merge 4 commits intomoby:masterfrom
thaJeztah:cleanup_docker0
Open

Integration: cleanup custom "docker0" bridge after tests#39571
thaJeztah wants to merge 4 commits intomoby:masterfrom
thaJeztah:cleanup_docker0

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Jul 19, 2019

hopefully addresses #39570

this also reverts 6aafe0f (added in #39068)

DockerDaemonSuite: cleanup "docker0" in TearDownTest

Some tests in the DockerDaemonSuite make modifications to the "docker0" bridge, which affects
tests running after them if they do not delete and re-create the interface.

This may also explain the test_init_swarm_data_path_addr in docker-py failing:

E docker.errors.APIError: 400 Client Error: Bad Request ("interface eth0 has more than one IPv6 address (2001:db8:1::242:ac11:2 and fe80::42:acff:fe11:2)")

Which may be caused by TestDaemonIPv6FixedCIDRAndMac :

s.d.StartWithBusybox(c, "--ipv6", "--fixed-cidr-v6=2001:db8:1::/64")

The following tests remove the docker0 interface:

  • DockerDaemonSuite.TestDaemonIPv6FixedCIDR
  • DockerDaemonSuite.TestDaemonIPv6FixedCIDRAndMac
  • DockerDaemonSuite.TestDaemonIPv6HostMode
  • DockerDaemonSuite.TestDaemonBridgeIP
  • DockerDaemonSuite.TestDaemonDefaultGatewayIPv4Implicit
  • DockerDaemonSuite.TestDaemonDefaultGatewayIPv4Explicit
  • DockerDaemonSuite.TestDaemonDefaultGatewayIPv4ExplicitOutsideContainerSubnet
  • DockerDaemonSuite.TestDaemonDefaultNetworkInvalidClusterConfig
  • DockerDaemonSuite.TestBridgeIPIsExcludedFromAllocatorPool

This patch removes docker0 after each test in the DockerDaemonSuite, to force
the next test to re-create it with the default configuration.

integration/network: cleanup custom "docker0" bridge

These tests modify the "docker0" bridge. Use a consistent approach to make sure the modified bridge is cleaned up afterwards.

@thaJeztah
Copy link
Copy Markdown
Member Author

@arkodg @andrewhsu PTAL

@thaJeztah thaJeztah changed the title DockerDaemonSuite: cleanup "docker0" in TearDownTest Integration: cleanup custom "docker0" bridge after tests Jul 19, 2019
@thaJeztah thaJeztah force-pushed the cleanup_docker0 branch 4 times, most recently from 3479244 to 1de0c5f Compare July 19, 2019 11:44
@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Jul 19, 2019

Interesting failure on WindowsRS5: https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS5-Process/3063/console

11:59:47 --- FAIL: TestHealthKillContainer (8.12s)
11:59:47     health_test.go:57: assertion failed: error is not nil: Error response from daemon: Invalid signal: SIGUSR1

That test was added recently in #39454, but rewritten in a commit in the same PR: f8aef6a

In that rewrite, there were some changes:

  • originally it was skipped on Windows, but the rewritten test doesn't have that skip:

    testRequires(c, DaemonIsLinux) // busybox doesn't work on Windows
  • the original test used SIGINT, but the new one uses SIGUSR1

@thaJeztah
Copy link
Copy Markdown
Member Author

This failure looks genuine; need to look into that https://jenkins.dockerproject.org/job/Docker-PRs-experimental/46047/console

12:52:28 FAIL: docker_cli_daemon_test.go:1778: DockerDaemonSuite.TestDaemonNoSpaceLeftOnDeviceError
12:52:28 
12:52:28 Creating a new daemon
12:52:28 assertion failed: 
12:52:28 Command:  /usr/local/cli/docker run --rm -v /tmp/no-space-left-on-device-test702266309:/test busybox sh -c dd of=/test/testfs.img bs=1M seek=3 count=0
12:52:28 ExitCode: 125
12:52:28 Error:    exit status 125
12:52:28 Stdout:   
12:52:28 Stderr:   /usr/local/cli/docker: Error response from daemon: failed to create endpoint practical_curran on network bridge: adding interface vethe57be3f to bridge docker0 failed: could not find bridge docker0: route ip+net: no such network interface.
12:52:28 
12:52:28 
12:52:28 Failures:
12:52:28 ExitCode was 125 expected 0
12:52:28 Expected no error

@thaJeztah
Copy link
Copy Markdown
Member Author

Fixed TestDaemonNoSpaceLeftOnDeviceError #39571 (comment), but will wait with pushing until CI completes to see if there's other failures.

make DOCKER_GRAPHDRIVER=vfs TESTDIRS='github.com/docker/docker/integration-cli' TESTFLAGS='-check.f DockerDaemonSuite.TestDaemonNoSpaceLeftOnDeviceError' binary test-integration-cli

Running integration-test (iteration 1)
Running /go/src/github.com/docker/docker/integration-cli
Loaded image: buildpack-deps:jessie
Loaded image: busybox:latest
Loaded image: busybox:glibc
Loaded image: debian:jessie
Loaded image: hello-world:latest
INFO: Testing against a local daemon
PASS: docker_cli_daemon_test.go:1778: DockerDaemonSuite.TestDaemonNoSpaceLeftOnDeviceError	14.673s
OK: 1 passed
PASS
---> Making bundle: .integration-daemon-stop (in bundles/test-integration)

We'll also have to discuss if these tests should actually remove "docker0", because (as seen) that affects the main daemon. Probably they should create a new bridge, and use that for the new daemon that is spun up as part of the tests.

@thaJeztah
Copy link
Copy Markdown
Member Author

OK, good news: that's the only failing test;

13:51:01 OOPS: 1467 passed, 53 skipped, 1 FAILED
13:51:01 --- FAIL: Test (5901.89s)

I'll push the fix for that test, and start working on using random interface-names (I'll make that a follow-up PR)

Copy link
Copy Markdown
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jul 19, 2019

looking at this, should we also consider deleting and recreating docker0 every time dockerd gracefully exits and gets created

@thaJeztah
Copy link
Copy Markdown
Member Author

Alas; it did not yet solve the problem with docker-py, so looks like there's still more to look into;

15:56:42 Err http://deb.debian.org jessie InRelease
15:56:42   
15:56:42 Err http://security.debian.org jessie/updates InRelease
15:56:42   
15:56:42 Err http://security.debian.org jessie/updates Release.gpg
15:56:42   Temporary failure resolving 'security.debian.org'
15:56:42 Err http://deb.debian.org jessie Release.gpg
15:56:42   Temporary failure resolving 'deb.debian.org'
15:56:42 Reading package lists...
15:56:42 W: Failed to fetch http://deb.debian.org/debian/dists/jessie/InRelease  
15:56:42 
15:56:42 W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease  
15:56:42 
15:56:42 W: Failed to fetch http://deb.debian.org/debian/dists/jessie/Release.gpg  Temporary failure resolving 'deb.debian.org'
15:56:42 
15:56:42 W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/Release.gpg  Temporary failure resolving 'security.debian.org'
15:56:42 
15:56:42 W: Some index files failed to download. They have been ignored, or old ones used instead.
15:56:42 Reading package lists...

@thaJeztah
Copy link
Copy Markdown
Member Author

looking at this, should we also consider deleting and recreating docker0 every time dockerd gracefully exits and gets created

Perhaps; I'm not sure if there's side effects in doing so?
I went looking at setting a unique name for the interface when starting the daemons in our tests, but some changes are needed in libnetwork do do so (was thinking to use the daemon's ID for that, as it's 12/13 characters so suitable as interface name (which I think have a maximum of 15 characters).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh. I need to figure out how to prevent my IDE from adding these unneeded aliases 😂

@thaJeztah
Copy link
Copy Markdown
Member Author

Dropped the revert commit of moving docker-py back to the end

@thaJeztah
Copy link
Copy Markdown
Member Author

RS1; https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/26003/console

18:58:13 ERROR: Failed 'ERROR: Failed to load d:\baseimages\windowsservercore.tar' at 07/22/2019 18:58:13
18:58:13 At C:\gopath\src\github.com\docker\docker\hack\ci\windows.ps1:343 char:17
18:58:13 + ...             Throw $("ERROR: Failed to load $readBaseFrom`:\baseimages ...
18:58:13 +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18:58:13 

@thaJeztah
Copy link
Copy Markdown
Member Author

rebased

@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Jul 29, 2019

doh! declared in a _unix file, not a _linux file 🤦‍♂

Windows RS1 https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/26084/console

18:49:15 INFO: Integration CLI tests being run from the host:
18:49:15 INFO: go test "-check.v" "-tags" "autogen" "-check.timeout" "10m" "-test.timeout" "200m" 
18:49:19 # github.com/docker/docker/integration-cli [github.com/docker/docker/integration-cli.test]
18:49:19 .\check_windows_test.go:5:6: createInterface redeclared in this block
18:49:19 	previous declaration at .\check_unix_test.go:8:70
18:49:19 .\check_windows_test.go:7:6: deleteInterface redeclared in this block
18:49:19 	previous declaration at .\check_unix_test.go:13:41
18:49:19 FAIL	github.com/docker/docker/integration-cli [build failed]
18:49:19 
18:49:19 
18:49:19 ERROR: Failed 'ERROR: Integration CLI tests failed at 07/29/2019 18:49:19. Duration:00:00:04.5001759' at 07/29/2019

Windows RS5 https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS5-Process/3180/console

18:50:23 INFO: Integration CLI tests being run from the host:
18:50:23 INFO: go test "-check.v" "-tags" "autogen" "-check.timeout" "10m" "-test.timeout" "200m" 
18:50:28 FAIL	github.com/docker/docker/integration-cli [build failed]
18:50:28 
18:50:28 
18:50:28 ERROR: Failed 'ERROR: Integration CLI tests failed at 07/29/2019 18:50:28. Duration:00:00:05.2970034' at 07/29/2019 

I wonder why the RS5 CI doesn't show the actual error here

@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Jul 29, 2019

New Jenkins is failing on; https://ci.docker.com/public/blue/organizations/jenkins/moby/detail/PR-39571/2/pipeline

GitHub has been notified of this commit’s build result

hudson.plugins.git.GitException: Command "git checkout -f c0b17564c20d86fe4f81869aa249645390d8311d" returned status code 128:
stdout: 
stderr: fatal: Unable to create '/var/jenkins_home/workspace/moby_PR-39571-CJZCXQE2GIZRBIHP5KGB75GWRLHUQJ3XJOD3ZSO6PNDT5M5A73WQ@script/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2016)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$800(CliGitAPIImpl.java:72)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2315)
Caused: hudson.plugins.git.GitLockFailedException: Could not lock repository. Please try again
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2334)
	at jenkins.plugins.git.MergeWithGitSCMExtension.checkout(MergeWithGitSCMExtension.java:146)
	at jenkins.plugins.git.MergeWithGitSCMExtension.decorateRevisionToBuild(MergeWithGitSCMExtension.java:112)
	at hudson.plugins.git.GitSCM.determineRevisionToBuild(GitSCM.java:1100)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1193)
	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)
	at org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:144)
	at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:120)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:293)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)
Finished: FAILURE

@thaJeztah
Copy link
Copy Markdown
Member Author

Two failures on experimental; looks like flaky, but will run again https://jenkins.dockerproject.org/job/Docker-PRs-experimental/46172/console

21:44:04 FAIL: docker_api_swarm_service_test.go:34: DockerSwarmSuite.TestAPIServiceUpdatePort
21:44:04 
21:44:04 Creating a new daemon
21:44:04 [df2300d3b942d] waiting for daemon to start
21:44:04 [df2300d3b942d] waiting for daemon to start
21:44:04 [df2300d3b942d] daemon started
21:44:04 
21:44:04 docker_api_swarm_service_test.go:40:
21:44:04     waitAndAssert(c, defaultReconciliationTimeout, d.CheckActiveContainerCount, checker.Equals, 1)
21:44:04 docker_utils_test.go:435:
21:44:04     c.Assert(v, checker, args...)
21:44:04 ... obtained int = 0
21:44:04 ... expected int = 1
21:44:04 
21:44:04 waited for 30.098481833s (out of 30s)
21:44:04 [df2300d3b942d] Stopping daemon
21:44:04 [df2300d3b942d] exiting daemon
21:44:04 [df2300d3b942d] Daemon stopped
22:09:10 FAIL: docker_cli_swarm_test.go:1563: DockerSwarmSuite.TestSwarmPublishDuplicatePorts
22:09:10 
22:09:10 Creating a new daemon
22:09:10 [d3da8c32f3998] waiting for daemon to start
22:09:10 [d3da8c32f3998] waiting for daemon to start
22:09:10 [d3da8c32f3998] daemon started
22:09:10 
22:09:10 docker_cli_swarm_test.go:1571:
22:09:10     // make sure task has been deployed.
22:09:10     waitAndAssert(c, defaultReconciliationTimeout, d.CheckActiveContainerCount, checker.Equals, 1)
22:09:10 docker_utils_test.go:435:
22:09:10     c.Assert(v, checker, args...)
22:09:10 ... obtained int = 0
22:09:10 ... expected int = 1
22:09:10 
22:09:10 waited for 30.046866866s (out of 30s)
22:09:10 [d3da8c32f3998] Stopping daemon
22:09:10 [d3da8c32f3998] exiting daemon
22:09:10 [d3da8c32f3998] Daemon stopped

@cpuguy83
Copy link
Copy Markdown
Member

With #39579 this will not be necessary.
Despite all the red, very close here. Issues are reduced to just 1 or 2 common problems.

Comment thread integration/network/service_test.go Outdated
Comment on lines +207 to +217
assert.Equal(t, out.IPAM.Config[0].Subnet, "172.60.0.0/16")
delInterface(t, defaultNetworkBridge)
assert.Equal(t, out.IPAM.Config[0].Subnet, "172.60.0.1/16")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self (I'm rebasing this one); check why I changed this from "172.60.0.0/16" to "172.60.0.1/16"

It's doing the same as DeleteInterface in the same package

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Signed-off-by: Sebastiaan van Stijn <[email protected]>
Some tests in the `DockerDaemonSuite` make modifications to the "docker0" bridge, which affects
tests running after them if they do not delete and re-create the interface.

This may also explain the `test_init_swarm_data_path_addr ` in `docker-py` failing:

```
E   docker.errors.APIError: 400 Client Error: Bad Request ("interface eth0 has more than one IPv6 address (2001:db8:1::242:ac11:2 and fe80::42:acff:fe11:2)")
```

Which may be caused by `TestDaemonIPv6FixedCIDRAndMac `:

```
s.d.StartWithBusybox(c, "--ipv6", "--fixed-cidr-v6=2001:db8:1::/64")
```

The following tests remove the `docker0` interface:

- `DockerDaemonSuite.TestDaemonIPv6FixedCIDR`
- `DockerDaemonSuite.TestDaemonIPv6FixedCIDRAndMac`
- `DockerDaemonSuite.TestDaemonIPv6HostMode`
- `DockerDaemonSuite.TestDaemonBridgeIP`
- `DockerDaemonSuite.TestDaemonDefaultGatewayIPv4Implicit`
- `DockerDaemonSuite.TestDaemonDefaultGatewayIPv4Explicit`
- `DockerDaemonSuite.TestDaemonDefaultGatewayIPv4ExplicitOutsideContainerSubnet`
- `DockerDaemonSuite.TestDaemonDefaultNetworkInvalidClusterConfig`
- `DockerDaemonSuite.TestBridgeIPIsExcludedFromAllocatorPool`

This patch removes `docker0` after each test in the `DockerDaemonSuite`, to force
the next test to re-create it with the default configuration.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
These tests modify the "docker0" bridge. Use a consistent approach
to make sure the modified bridge is cleaned up afterwards.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants