-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
- relates to vendor: github.com/vishvananda/netlink v1.3.0 #48368
- relates to update to go1.22.6 #46982 (comment)
- relates to Fix recvfrom goroutine leak vishvananda/netlink#793 (comment)
github.com/vishvananda/netlink was updated to v1.3.0 in #46982, but resulted in flakiness in CI;
Error initializing network controller: list bridge addresses failed: interrupted system call
Upon first look, it was suggested that this was due to a missing condition for handling EINTR; #46982 (comment)
EINTR on netlink sockets is a new one. I suspect it has more to do with the netlink dependency bump you pulled in when rebasing than on the Go toolchain bump. I think the bug is here: https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L821-L824 It retries on
EWOULDBLOCK(a.k.a.EAGAIN) but neglects to retry onEINTR.
However, it may be because of our use of SetSocketTimeout ; see vishvananda/netlink#793 (comment)
IO calls on non-blocking sockets will never return
-EINTR. The problem here is that Moby callsSetSocketTimeout, which setsSO_SNDTIMEOandSO_RCVTIMEO. These socket options are only useful for sockets in blocking mode. Setting these probably places the socket back into blocking mode.moby/libnetwork/ns/init_linux.go
Lines 25 to 40 in 92a05cf
// Init initializes a new network namespace func Init() { var err error initNs, err = netns.Get() if err != nil { log.G(context.TODO()).Errorf("could not get initial namespace: %v", err) } initNl, err = netlink.NewHandle(getSupportedNlFamilies()...) if err != nil { log.G(context.TODO()).Errorf("could not create netlink handle on initial namespace: %v", err) } err = initNl.SetSocketTimeout(NetlinkSocketsTimeout) if err != nil { log.G(context.TODO()).Warnf("Failed to set the timeout on the default netlink handle sockets: %v", err) } } I'm not very familiar with this project, but it seems to me that before this PR, only blocking IO is used. This PR adds calls to set the socket as non-blocking, but still allows setting the timeout socket options.
Those SetSocketTimeout uses were added in moby/libnetwork@f459afb
- Set a timeout to the netlink handle sockets libnetwork#1557
- (moby PR: Porting libnetwork fixes #29004)
to address
- netlink calls should have timeouts libnetwork#1474
- too many libnetwork-setkey process suspending, create new container timeout due to possible dead lock libnetwork#2208
Also related: