Skip to content

Wait for a route to ff02::1 before sending NAs#49392

Merged
vvoland merged 1 commit intomoby:masterfrom
robmry:wait_for_mcast_route
Feb 6, 2025
Merged

Wait for a route to ff02::1 before sending NAs#49392
vvoland merged 1 commit intomoby:masterfrom
robmry:wait_for_mcast_route

Conversation

@robmry
Copy link
Copy Markdown
Contributor

@robmry robmry commented Feb 4, 2025

- What I did

Commit aa3a23d (Temporary debug for unsolicited NA) added code to check for a route from ::1 to ff02::1 if a neighbour advertisement send failed ... hoping to show that the route appeared fairly quickly after the first attempt.

The failure didn't happen in CI, perhaps other changes in that PR changed the timing enough for it not to be an issue. Or, perhaps the change in commit 9a6e96f (Before sending ARPs/NAs, check the bridge is ready) was enough to solve the issue.

But, in case it's a timing issue we're just no longer seeing on the CI hosts - try to check the route exists before trying to send an NA, and wait for a short time if not.

- How I did it

After waitForIfUpped and waitForBridgePort, if there's an IPv6 address to advertise - waitForMcastRoute as well.

Unfortunately, for reasons I don't yet understand and can't repro locally, in CI the netlink call RouteGetWithOptions call sometimes returns EMSGSIZE. Once it does, it's persistent (the retries don't help). I think that normally means too many attributes were added to a request/response - but, in this case, the request is always the same, the response should have zero or one routes and, even if it had more, they should be split into multiple responses.

But ... the NA send appears to work anyway (at-least, there's no write ip ::1->ff02::1: sendmsg: network is unreachable error).

So, waitForMcastRoute is just best-effort. If it sees EMSGSIZE, it doesn't delay things any further. If the route is still not set up after max retries, it just carries on.

Another option would be to just delete the route check and see if the network-unreachable error crops up in the wild. But, I think this approach probably gives us the best chance of avoiding issues in 28.0. There's still a lot to understand though - why wasn't the route set up when the original errors were seen, and why the EMSGSIZE errors? The route from ::1 shouldn't really be necessary, the NA just needs to be fired out of the new interface. But, Go's x/net/ipv6 code appears to need it - perhaps there's a better way to implement the send.

- How to verify it

Tricky, because I wasn't able to repro the write ip ::1->ff02::1: sendmsg: network is unreachable or EMSGSIZE errors locally, or confirm the cause of the original net-unreachable error with extra debug under CI.

But, nothing extra is logged in a normal run - the route exists before the first check. And, by nobbling the code to loop until the timeout (and from hitting the EMSGIZE errors) - checked that the timeout worked.

- Human readable description for the release notes

@robmry robmry added area/networking Networking kind/bugfix PR's that fix bugs area/networking/ipv6 Networking labels Feb 4, 2025
@robmry robmry added this to the 28.0.0 milestone Feb 4, 2025
@robmry robmry self-assigned this Feb 4, 2025
@robmry robmry force-pushed the wait_for_mcast_route branch from f0740ac to ad9aa69 Compare February 4, 2025 17:35
@robmry robmry force-pushed the wait_for_mcast_route branch from ad9aa69 to c825bc6 Compare February 5, 2025 09:59
Commit aa3a23d (Temporary debug for unsolicited NA) added code to
check for a route from ::1 to ff02::1 if a neighbour advertisement
send failed ... hoping to show that the route appeared fairly quickly
after the first attempt.

The failure didn't happen in CI, perhaps other changes in that PR
changed the timing enough for it not to be an issue. Or, perhaps the
change in commit 9a6e96f (Before sending ARPs/NAs, check the bridge is
ready) was enough to solve the issue.

But, in case it's a timing issue we're just no longer seeing on the CI
hosts - check the route exists before trying to send an NA, and wait
for a short time if not.

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry force-pushed the wait_for_mcast_route branch from c825bc6 to d4fa252 Compare February 5, 2025 10:57
@robmry
Copy link
Copy Markdown
Contributor Author

robmry commented Feb 5, 2025

thompson-shaun requested review from akerouanton, thaJeztah and vvoland [16 hours ago]

Thanks Shaun ... but this PR was/is still draft, work-in-progress.

@robmry robmry marked this pull request as ready for review February 5, 2025 12:12
Comment on lines +557 to +559
const pollInterval = 10 * time.Millisecond
const maxWait = 200 * time.Millisecond
for range int64(maxWait / pollInterval) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if something like a ticker and/or ctx, cancel := context.WithTimeout(200 * time.Millisecond) would be something to consider?

I'm horrible at tickers though, as they always confuse me, but I think @vvoland is better at them 😂 🙈

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's happening in the foreground, so I'm not sure that'd help? I used the same logic in waitForBridgePort, added a few days ago.

ctx, span := otel.Tracer("").Start(ctx, "libnetwork.osl.waitForMcastRoute")
defer span.End()

const pollInterval = 10 * time.Millisecond
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to be optimistically re-trying, perhaps some exponential / increasing timeout would work? 10 milliseconds feels long if this is in the hot path, or is this code only ran on startup, not for every network connect / container start?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's every network connect - so it'll happen once per initial network connection, then for every subsequent network-connect.

The normal case should be that the route's already ready before the first check. So, no delay apart from the netlink request/response.

But, if the route isn't ready, without a delay the network-connect will just fail. So, waiting for a short time is ok, and waiting for a longer time (still milliseconds) if it's not going to work probably isn't really worse than failing sooner. I think, we might as well poll quickly for that whole time, to get going as soon as possible if it does spring in to life.

(If the interface doesn't appear to come "up" in waitForIfUpped, the delay is 5s. That's probably too long really, but all it can do is fail if the interface doesn't start, and we've no data on how long these things take on slow/busy hosts.)

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but perhaps could use a revisit for some of the things

@vvoland vvoland merged commit f353b56 into moby:master Feb 6, 2025
@robmry robmry deleted the wait_for_mcast_route branch February 26, 2025 14:58
@uweber
Copy link
Copy Markdown

uweber commented Mar 6, 2025

FYI: This broke our high availability container setup.
We attach two ethernet interfaces via macvlan to a container, where only on interface has link state up at a time.
Since one interface has no physical link, the interface has no working IPv6 address and setup fails with:
failed to set up container networking: failed to add interface vethcc2aa94 to sandbox: failed to advertise addresses: write ip ::1->ff02::1: sendmsg: network is unreachable

@robmry
Copy link
Copy Markdown
Contributor Author

robmry commented Mar 6, 2025

FYI: This broke our high availability container setup. We attach two ethernet interfaces via macvlan to a container, where only on interface has link state up at a time. Since one interface has no physical link, the interface has no working IPv6 address and setup fails with: failed to set up container networking: failed to add interface vethcc2aa94 to sandbox: failed to advertise addresses: write ip ::1->ff02::1: sendmsg: network is unreachable

Hi @uweber - thank you for letting us know ... so, we should just log instead in this case. I'll take a look.

I've raised this as #49593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/ipv6 Networking area/networking Networking kind/bugfix PR's that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants