Skip to content

[WIP] Run test daemons in a network namespace.#39579

Closed
cpuguy83 wants to merge 3 commits intomoby:masterfrom
cpuguy83:tests_netns_per_daemon
Closed

[WIP] Run test daemons in a network namespace.#39579
cpuguy83 wants to merge 3 commits intomoby:masterfrom
cpuguy83:tests_netns_per_daemon

Conversation

@cpuguy83
Copy link
Copy Markdown
Member

This isolates test daemons further, particularly for global resources
such as iptables and port-space.

Comment thread internal/test/daemon/daemon_unix.go Outdated
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch 8 times, most recently from 9430b0b to 8f2cde6 Compare July 25, 2019 22:54
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch from 8f2cde6 to eab38f1 Compare July 26, 2019 19:14
@cpuguy83
Copy link
Copy Markdown
Member Author

Ok, generally pretty close here but I am running into issues with things hitting localhost.
It seems I'm in the netns but anything with localhost (a daemon, a registry, whatever) don't seem to work so something's not right there.
Before I spend too much more time debugging this (or maybe someone else can point out what's going on), PTAL and let me know if this is worth the change.

ping @tonistiigi @thaJeztah @arkodg @tiborvass

@cpuguy83
Copy link
Copy Markdown
Member Author

@tonistiigi There are a couple of places where I have a proxy server setup. Some TLS tests make this a bit annoying.

@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch from eab38f1 to 37e35f6 Compare July 31, 2019 00:11
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch 5 times, most recently from ff0584f to fc3c4ba Compare August 2, 2019 22:40
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch 4 times, most recently from 8a26d67 to 582598f Compare August 7, 2019 01:36
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch 2 times, most recently from e2ed291 to f539934 Compare August 7, 2019 02:33
@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Aug 7, 2019

Is it expected that janky is not running integration tests, @thaJeztah @tiborvass

@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Aug 7, 2019

Nevermind I forgot I cherry-picked in a fix for the flaky test detector and that fix was apparently not right.

@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch 4 times, most recently from 2f6f167 to 88cd472 Compare August 7, 2019 23:33
This isolates test daemons further, particularly for global resources
such as iptables and port-space.

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the tests_netns_per_daemon branch from 88cd472 to cf19ef6 Compare August 8, 2019 20:03
@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Aug 8, 2019

Well I think this would be mostly ready except tests are taking WAY longer to run and I do not know why.
The namespace configuration overhead is negligible (for both creating and deleting), and the actual daemon startup (process start -> API ready) is also minimal.
Tests are communicating over the unix socket so not even bridge overhead there.

The slowest netns config I've seen is 4s (that is creating the ns and setting up the bridge) and that is an outlier, most are under 1s, meanwhile tests that should take 2s are taking 12 (as an example).
This is extremely evident in the integration/service tests (which are timing out).

Example from this PR:

--- PASS: TestDockerNetworkReConnect (21.41s)
    network_test.go:83: Creating a new daemon
    net_linux.go:62: [d3541f40c37fb] creating network namespace
    net_linux.go:75: [d3541f40c37fb] setting up cni networking
    net_linux.go:49: [d3541f40c37fb] configureNetNS duration: 2.269827
    daemon.go:363: [d3541f40c37fb] waiting for daemon to start
    daemon.go:363: [d3541f40c37fb] waiting for daemon to start
    daemon.go:391: [d3541f40c37fb] daemon started
    daemon.go:324: [d3541f40c37fb] daemon startup time: 0.740407
    daemon.go:501: [d3541f40c37fb] Stopping daemon
    daemon.go:331: [d3541f40c37fb] exiting daemon
    daemon.go:488: [d3541f40c37fb] Daemon stopped
    net_linux.go:98: [d3541f40c37fb] cleaning up cni networking
    net_linux.go:103: [d3541f40c37fb] removing network namespace
    net_linux.go:96: [d3541f40c37fb] netns cleanup duration: 0.182375s

And another PR:

11:56:06 --- PASS: TestDockerNetworkReConnect (3.31s)
11:56:06     network_test.go:82: Creating a new daemon
11:56:06     daemon.go:336: [d2859c14b39fa] waiting for daemon to start
11:56:06     daemon.go:336: [d2859c14b39fa] waiting for daemon to start
11:56:06     daemon.go:364: [d2859c14b39fa] daemon started
11:56:06     daemon.go:472: [d2859c14b39fa] Stopping daemon
11:56:06     daemon.go:307: [d2859c14b39fa] exiting daemon
11:56:06     daemon.go:459: [d2859c14b39fa] Daemon stopped

@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Aug 8, 2019

Seems to be that Cleanup calls Stop now and we always call cleanup.

@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Aug 8, 2019

🙁

16:36:06 --- PASS: TestServiceUpdateConfigs (44.53s)
16:36:06     update_test.go:141: Creating a new daemon
16:36:06     net_linux.go:60: [df6f03afbbf1a] creating network namespace
16:36:06     net_linux.go:73: [df6f03afbbf1a] setting up cni networking
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon configureNetNS duration: 0.039535s
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon configureNetNS duration: 0.000000s
16:36:06     daemon.go:375: [df6f03afbbf1a] waiting for daemon to start
16:36:06     daemon.go:375: [df6f03afbbf1a] waiting for daemon to start
16:36:06     daemon.go:403: [df6f03afbbf1a] daemon started
16:36:06     daemon.go:336: [df6f03afbbf1a] daemon startup time: 0.507045
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon daemon start duration: 0.507097s
16:36:06     daemon.go:514: [df6f03afbbf1a] Stopping daemon
16:36:06     daemon.go:343: [df6f03afbbf1a] exiting daemon
16:36:06     daemon.go:501: [df6f03afbbf1a] Daemon stopped
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon daemon stop duration: 0.481983s
16:36:06     net_linux.go:93: [df6f03afbbf1a] cleaning up cni networking
16:36:06     net_linux.go:98: [df6f03afbbf1a] removing network namespace
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon daemon nsnet cleanup duration: 0.053191s
16:36:06     daemon.go:262: [df6f03afbbf1a] Daemon cleanup duration: 0.535373s

As compared to master:

11:57:09 --- PASS: TestServiceUpdateConfigs (14.67s)
11:57:09     update_test.go:139: Creating a new daemon
11:57:09     daemon.go:336: [def447ee5e7b6] waiting for daemon to start
11:57:09     daemon.go:336: [def447ee5e7b6] waiting for daemon to start
11:57:09     daemon.go:364: [def447ee5e7b6] daemon started
11:57:09     daemon.go:472: [def447ee5e7b6] Stopping daemon
11:57:09     daemon.go:307: [def447ee5e7b6] exiting daemon
11:57:09     daemon.go:459: [def447ee5e7b6] Daemon stopped

@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Oct 3, 2019

Going to close this as there's some as of yet unexplained performance impact here, it also ended up being a really complex change...

Maybe someone else wants to pick this up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants