libnetwork/networkdb: fix most flaky loopback tests#49938
Merged
thaJeztah merged 3 commits intomoby:masterfrom May 8, 2025
Merged
libnetwork/networkdb: fix most flaky loopback tests#49938thaJeztah merged 3 commits intomoby:masterfrom
thaJeztah merged 3 commits intomoby:masterfrom
Conversation
The NetworkDB unit tests instantiate clusters which communicate over loopback where every "node" listens on a distinct localhost port. The tests make use of a NetworkDB configuration knob to set the port. When the NetworkDB configuration's BindPort field is set to a nonzero value, its memberlist instance is configured to bind to the specified port number. However, the advertise port is left at the memberlist.DefaultLANConfig() default value of 7946. Because of this, nodes would be unable to contact any of the other nodes in the cluster learned by gossip as the gossiped addresseses specify the wrong ports! The flaky tests passed as often as they did thanks to the robustness of the memberlist module: NetworkDB gossip and and memberlist node liveness-probe pings to unreachable nodes can all be relayed through the reachable nodes, the nodes on the bootstrap join list. Make the NetworkDB unit tests less flaky by setting each node's advertise port to the bind port. The daemon is unaffected by this oversight as it unconditionally uses the default listen port of 7946, which aligns with the advertise port. Signed-off-by: Cory Snider <[email protected]>
NetworkDB defaults to binding to the unspecified address for gossip communications, with no advertise address set. In this configuration, the memberlist instance listens on all network interfaces and picks one of the host's public IP addresses as the advertise address. The NetworkDB unit tests don't override this default, leaving them vulnerable to flaking out as a result of rogue network traffic perturbing the test, or the inferred advertise address not being useable for loopback testing. And macOS prompts for permission to allow the test executable to listen on public interfaces every time it is rebuilt. Modify the NetworkDB tests to explicitly bind to, advertise, and join ports on 127.0.0.1 to make the tests more robust to flakes in CI and more convenient to run locally. Signed-off-by: Cory Snider <[email protected]>
The loopback-test fixes seem to be sufficient to resolve the flakiness of all the tests aside from TestFlakyNetworkDBIslands. Signed-off-by: Cory Snider <[email protected]>
robmry
approved these changes
May 7, 2025
thaJeztah
reviewed
May 8, 2025
| db := launchNode(t, localConfig) | ||
| if i != 0 { | ||
| assert.Check(t, db.Join([]string{fmt.Sprintf("localhost:%d", db.config.BindPort-1)})) | ||
| assert.Check(t, db.Join([]string{net.JoinHostPort(db.config.AdvertiseAddr, strconv.Itoa(db.config.BindPort-1))})) |
Member
There was a problem hiding this comment.
would be great if the db had an accessor to get what it's listening on (I was looking at the -1 a few minutes before realising this is to join the "previous" DB instance in the loop).
Contributor
Author
There was a problem hiding this comment.
db.config.BindPort is that accessor. The -1 is to get db N to join the cluster by dialing db N-1's address.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
TestFlakyNetworkDBIslands.- How I did it
See commit messages
- How to verify it
Trial by fire: remove the flaky flag from the tests
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)